Offensive Engineering #3: Securing Agentic AI Against Data Leakage
Mahesh Goyal on agentic AI governance, cryptographic identity, and why existing security architectures were not built for this
Verizon published the 2026 Data Breach Investigations Report on May 20, analyzing 31,000 security incidents and 22,000 confirmed breaches which is nearly double last year’s count, and for the first time in the report’s nineteen-year history, vulnerability exploitation has overtaken stolen credentials as the leading initial access vector, accounting for 31% of breaches, with AI accelerating the window from months to hours.
The report flags what most enterprise security programs have not yet structurally addressed: employee use of shadow AI tripled to 45% of the workforce in a single year, with the most common data type submitted to unauthorized external AI models being source code, while Verizon explicitly called out service and machine accounts as the identity class to watch, stating that those will likely be the ones leveraged in an agentic AI future. A compromised service account used by an agent is not a single-user breach, but an autonomous actor with persistent access and a blast radius that existing IAM models were not designed to contain.
That is precisely the governance gap Mahesh Kumar Goyal, senior data and AI engineer at Google, walks through in today’s issue.
THIS WEEK’S GEO-POLITICAL NARRATIVE
Moonlight Maze Explains Autonomous AI Dangers Today
Between 1996 and 1998, attackers linked to Russian intelligence exfiltrated classified data from US government networks for two years undetected, because authorized access and systematic exfiltration looked identical to the monitoring infrastructure in place - the same structural blind spot that ungoverned autonomous agents produce in enterprise environments today.
Narrative Link: Moonlight Maze Explains Autonomous AI Dangers Today — InfoSec Relations
The Insider View
Featuring Mahesh Kumar Goyal, Senior Data and AI Engineer at Google.
The shift from passive AI to autonomous agents changes the security problem in a way that most enterprise security architectures have not yet caught up with. A traditional application is predictable and stateless - it receives an input, executes a defined set of operations, and produces an output, and that sequence is the same every time. An autonomous agent does something fundamentally different: it browses the web, executes API calls, writes and runs code, maintains memory across sessions, and makes decisions about what to do next with minimal human intervention at each step. That combination of non-determinism, persistent memory, and broad tool access creates a threat surface that conventional governance frameworks were not designed to handle, and the gap between what existing security tools can see and what agents are actually doing is where the risk accumulates.
Mahesh Goyal, a senior data and AI engineer at Google specializing in advanced agentic AI systems and responsible AI architecture, spoke with Offensive Engineering to walk through what that risk surface actually looks like, where enterprise security teams are consistently getting the governance model wrong, and what the mandatory controls are before any agentic system should move into production.
Agents Are Not Smarter Scripts
The most consequential misconception Goyal encounters in enterprise environments is the assumption that an autonomous agent is essentially a more capable version of an existing automated script, and that the monitoring and observability tooling already in place is sufficient to secure it. It is not, and the reason comes down to what agents do that scripts do not: they move data across multiple systems without human intervention, they maintain state between interactions through persistent memory, and they interact with external tools and APIs in ways that are not fully predictable at the time of deployment.
“The threat of data leakage in autonomous AI agents is significantly larger than in traditional applications,” Goyal explains. “Unlike traditional applications, which are predictable and stateless, agents are non-deterministic, possess memory, interact with various tools, and move data across multiple systems without human intervention, making them difficult to secure.” The practical implication of that non-determinism is that an agent can function correctly for thousands of iterations before encountering a condition that causes it to behave in a way nobody anticipated, and by the time that condition surfaces, the agent may have already moved sensitive data somewhere it should not have gone.
The tools that enterprise security teams rely on - endpoint detection and response, identity and access management, data loss prevention - were built for human-driven or static services, where behavior is predictable enough that anomalies stand out. An agent operating across multiple systems and APIs does not produce the kind of consistent behavioral baseline that anomaly detection depends on, which means the signals those tools are looking for are not the signals an agent compromise would generate.
The Governance Gap That Agentic AI Exposes
Conventional access control frameworks assume that the thing being controlled has a defined, stable identity with a predictable set of permissions attached to it. Role-based access control, column-level access control, and least-privilege implementations all work on that assumption. Agents break it because they blur the boundary between code and data - an agent does not just execute instructions, it can interpret and act on content it retrieves from external sources, which means the boundary between what the agent is authorized to do and what it can be made to do is not enforced by its role assignment alone.
Goyal points to a structural problem that compounds this: most organizations keep their AI strategy separate from their broader data governance strategy, which means the agentic systems being built by AI and data engineering teams are not subject to the governance oversight that applies to everything else. “Companies often keep their AI strategy separate from their broader data governance strategy, and these new systems lack the established oversight required to manage dynamic interactions between agents and external APIs,” he explains. The result is agentic systems entering production with no centralized inventory, no defined ownership, and no security review process that accounts for the specific risks they carry.
Prompt Injection Is the SQL Injection of Agentic AI
SQL injection worked because applications trusted user-supplied input and passed it directly to a database query without validating whether the input was data or instruction. Prompt injection works on the same principle, applied to agents: because an agent processes the content it retrieves from external sources as part of its reasoning context, a malicious prompt embedded in a web page, a document, or an API response can influence what the agent does next.
The memory dimension makes this significantly more dangerous than a stateless injection attack. “Because agents maintain long-term memory, a single malicious prompt can poison the agent’s history or output,” Goyal explains, and because that memory persists across sessions, a successful injection does not just affect one interaction - it contaminates the agent’s context in a way that can influence subsequent behavior until the memory is explicitly cleared or the contamination is detected.
Goyal argues that organizations need specialized architectural controls to address this, specifically sandboxing to isolate agent execution environments, provenance verification to validate the sources of content the agent is processing, and agent gateways that can filter prompts before they reach the agent’s reasoning context - controls that have no direct equivalent in the security tooling built for traditional application architectures.
Managing Agent Memory Like a Database
Agent memory is not a log or a cache in the conventional sense - it is an active input to the agent’s decision-making process, which means it carries the same security requirements as any other data store that influences system behavior. Goyal’s framing is precise: agent memory should be managed like a database, with encryption at rest, version control to track what has been written to it and when, and strict scoping of read and write access so that an agent can only interact with the memory partitions its task actually requires.
The session isolation requirement adds another dimension that most current implementations are not accounting for. Without explicit context isolation between user sessions, an agent serving multiple users can carry context from one session into another, creating a data leakage path that does not require an external attacker to exploit — it is a design condition that produces the leak on its own. Goyal recommends making agent memory ephemeral where possible and using context summarization techniques to prevent contamination from accumulating across sessions.
What Mandatory Governance Actually Requires
Before any agentic system reaches production, Goyal argues there are governance requirements that are not optional and not addressable after the fact. The first is a centralized inventory of every agentic system in the organization’s environment, including Model Context Protocol servers and retrieval-augmented generation agents, with defined ownership, data classification, and documented tool access for each one. Without that inventory, an organization cannot assess the blast radius of a compromise — which is the second requirement Goyal identifies as mandatory: explicitly estimating how far an exploit of a given agent could reach across connected systems and data stores before deploying it.
The third requirement is cryptographic identity. Every agent needs a unique cryptographic identity with its own dedicated key pair, and those keys must not be shared between agents. “Assess the blast radius of an exploit and implement unique cryptographic identities for every agent to ensure that keys are not shared,” Goyal explains. The reason this matters operationally is that shared keys make it impossible to attribute actions to specific agents after the fact and impossible to revoke access for one agent without affecting everything else using the same credential.
Zero Trust Applied at the Agent Layer
Zero Trust as a security principle means that no action is implicitly trusted because of where it originates - every action must be authenticated, authorized, and verified regardless of whether it comes from outside or inside the network boundary. Applying that principle to agentic AI means extending it to agent-to-agent communication, which most current implementations treat as implicitly trusted once the orchestrating agent has been authenticated.
Goyal argues that this is a critical gap: if an orchestrating agent’s memory has been poisoned through a prompt injection, the downstream agents it communicates with will receive and act on that poisoned context unless they independently verify what they are receiving. “Even in agent-to-agent communication, authentication is necessary to prevent downstream agents from being impacted by poisoned memory,” he explains. The practical implementation he recommends moves away from hard-coded API keys, which are routinely exposed accidentally in log files and test cases, toward short-lived tokens and Mutual TLS for authentication, which limit the window of exposure if a credential is compromised.
Human Oversight Remains a Non-Negotiable Control
The efficiency argument for autonomous agents rests partly on reducing the number of human decision points in a workflow, but Goyal draws a clear line between the decision points that can be automated and the ones that cannot. For high-risk tasks — financial calculations, actions with irreversible consequences, decisions that affect sensitive data at scale — human-in-the-loop approval is not an optional safeguard that can be traded off against operational speed, it is a mandatory architectural control that prevents the kind of cascading error an autonomous agent can produce before any monitoring system has time to flag it.
Looking at the next three to five years, Goyal sees runtime monitoring and rigorous protection of sensitive data as the two capabilities that will determine whether enterprise adoption of agentic AI is sustainable or produces a series of high-profile failures that force organizations to rebuild systems they deployed without adequate governance.
The organizational dimension of that challenge is quite significant as many organizations discover the need for governance only after an agentic system has already reached production, at which point the cost of re-evaluating the entire codebase and workflow is substantially higher than building the governance model from the start. Treating security as a foundational element of agentic AI deployment, rather than a late-stage review, is the condition under which the rest of the architecture holds.
You can also watch the full live session here.
THIS WEEK’S PERSON OF INTEREST
George Kurtz - CEO and Founder, CrowdStrike
George Kurtz has been at the center of two of the most consequential software deployment failures in cybersecurity history - as CTO of McAfee in 2010, when a faulty antivirus update deleted a critical Windows system file across millions of enterprise machines, and as CEO of CrowdStrike in 2024, when a defective Falcon sensor update crashed 8.5 million Windows systems globally - both originating from an update that bypassed the validation controls designed to catch it before it shipped.
Kurtz frequently warns organizations about the dangers of ungoverned automated systems and advocates for rigorous security governance. But the security community continues to note the obvious tension of this dynamic as one of the industry’s loudest voices on the necessity of strict deployment controls is the exact same executive behind two of the largest uncontrolled software deployment disasters on record.
SECURITY BRIEFS
A look at recent critical vulnerabilities where autonomous agents turned prompt injections into system-level breaches.
Semantic Kernel: Prompt to Shell
Two critical CVEs (CVE-2026-26030, CVE-2026-25592) in Microsoft’s Semantic Kernel allowed a single crafted prompt to achieve host-level remote code execution — no exploit chain required, just an agent doing its job. Patched in SDK v1.71.0.
Source: Microsoft Security
Comment and Control: AI Agents Leaking CI Secrets
A CVSS 9.4 Critical prompt injection attack across Claude Code, Gemini CLI, and GitHub Copilot exploited pull_request_target workflows to steal credentials through comment fields, with no vendor publishing injection resistance metrics for their agent runtimes.
Source: VentureBeat
CrewAI: Four CVEs, One Exploit Chain
Four CVEs in CrewAI’s default configurations allow prompt injection to chain into RCE, SSRF, and arbitrary file reads within the same sequence — a compound attack path that component-level security reviews would not surface.
Source: Carnegie Mellon University
Azure SRE Agent: Unauthenticated Access to Live Commands
CVE-2026-32173 (CVSS 8.6) exposed live Azure SRE Agent command streams to any Entra ID account holder through an unauthenticated WebSocket endpoint, a direct result of deploying a privileged agent before its access controls were fully scoped.
Source: CSO
MemoryTrap: One Injection, Many Sessions Poisoned
A vulnerability in Claude Code’s memory system allows a single malicious input to contaminate the agent’s persistent memory and propagate across multiple user sessions, exposing the cost of deploying agent memory without strict scoping and session isolation.
Source: Help Net Security
Thank you for reading this issue of Offensive Engineering on Securing Agentic AI Against Data Leakage featuring Mahesh Kumar Goyal.
Stay Curious, Stay Secure!
Data Practitioner
Technical Contributor, Offensive Engineering — InfoSec Relations





