Offensive Engineering #1: Agents with Offensive Capability

Albert Ziegler on autonomous security agents, offensive architecture, and the governance gap

Puspita Pradhan, S Eben J, and S Pattnaik

Apr 23, 2026

On April 17, 2026, Palo Alto Networks Unit 42 confirmed that Iran-aligned threat groups had maintained continuous attack operations throughout a 47-day domestic internet blackout by routing traffic through Starlink and other satellite providers, according to their threat brief published the same day. The groups did not pause. But they adapted infrastructure and kept probing. And so the organizations on the receiving end of those operations had no equivalent continuous coverage running against their own attack surfaces.

That gap is not unique to conflict zones. Somewhere on the internet right now, an automated system is scanning an internet-facing asset that its service owner last tested who knows how many months ago. That is how the attacking side has operated for years. And what has changed, though, with the surge of AI integration into everyday workflows, is the capability behind the probe. Today’s AI-driven offensive tooling does not match signatures against known patterns. It reasons about what it finds, adapts in real time, and chains discoveries into exploit sequences that no existing signature would catch. And it does this at a scale no security team can replicate.

The security side has had no equivalent answer to that until now. And so this issue, featuring Albert Ziegler, Head of AI at XBOW and former Principal Researcher at GitHub, examines what it looks like when autonomous agents are deployed at scale on the defensive side — say five thousand coordinated agents running a single penetration test — with architecture and governance built for the reality that the attacker is already operating this way.

Before the feature, a short historical note on how the attacker’s tempo advantage began. And do check out the person of interest this week.

Closing the Container Security Gap

A live interview with Docker Captain Advait Patel on why engineering teams scan containers and still ship vulnerable images into production.

THIS WEEK’S HISTORICAL NARRATIVE

How in November 1988 Automated Offence Got Its Head Start

On the night of November 2, 1988, Cornell graduate student Robert Tappan Morris released a self-replicating piece of code onto the early internet. The worm exploited three Unix vulnerabilities simultaneously and spread across the network faster than administrators could track it. Within hours it had compromised between six to ten thousand machines. People (administrators, developers, etc) spent three days on manual response, coordinating over phone lines, with no automated tool to match what was running against them.

Narrative Link: The Morris Worm Gave Attackers an Offensive Start — InfoSec Relations

The Insider View

Featuring Albert Ziegler, Head of AI at XBOW and former Principal Researcher at GitHub.

The Architecture of Autonomous Penetration Testing Agents

Enterprise security teams have for years accepted that penetration testing is non-negotiable and that doing it well requires human expertise that is expensive, scarce, and impossible to deploy at the frequency realistically required to tackle escalating attack surfaces. And the standard model that emerged from that constraint was periodic testing. Say, for instance, a qualified team comes in, works through a defined scope, produces a report, and the organization uses what they find to close the known gaps before the next scheduled assessment. The interval between assessments, often six months to a year for most assets and longer for anything that seems less critical, is not a security decision. It is a capacity decision driven by the availability of human experts who can only be in one place doing one thing at a time.

The attacking side of that equation does not operate under the same constraint, and it never did. Automated scanning tools have been probing public-facing assets continuously for as long as those assets have been online. But the current shift is qualitatively different from automated scanning. AI-powered offensive tooling does not just probe attack surfaces. It can, at scale, reason about the environment it encounters, adapt to what it finds, and chain discoveries into exploit sequences that a signature-based scanner would otherwise never identify because no signature exists for the specific combination of conditions it is exploiting. And it can do all of this at a scale and a speed that makes the gap between how often security practitioners test and how often attackers exploit wider than it has ever been.

Scale Changes Everything

Albert Ziegler has worked across AI systems and enterprise-scale engineering, first building GitHub Copilot and GitHub Advanced Security as Principal Researcher at GitHub, and now as Head of AI at XBOW, an autonomous penetration testing company whose assessments can involve five thousand agents working in coordination across a single customer’s attack surface. That number is worth sitting with for a moment. Over five thousand agents, coordinating across an assessment in real time, each one scoped to a specific task, each one reporting into a system that decides what to do with what they find.

The scale shift matters for a reason that goes beyond the obvious efficiency argument. “It used to be that you put a website online and pretty soon you got your first automated scan,” Ziegler explains. “Well, this pretty soon becomes shorter and shorter, as attackers can also leverage scale and they only need to be successful once.” The asymmetry has always existed in theory.

AI-powered tooling on the offensive side has made it operational at a scale that changes the calculus for every asset a security analyst manages, not just the crown jewels. Assets that were previously tested infrequently because they seemed less critical are now being probed by automated systems that have no cost constraint on which targets they include. If a security practitioner can only afford thorough testing for their most valuable assets, and an attacker can afford to probe everything continuously, then the less critical assets become the path of least resistance. Continuous AI-driven penetration testing is not a luxury upgrade on the standard model, but an appropriate defensive response to what the offensive side is already doing.

What an Assessment Actually Looks Like

The architecture of an XBOW assessment reflects a design philosophy that runs counter to the intuition that more capable agents should be given broader mandates. Ziegler’s team has found the opposite to be true in practice. The individual tasks that each agent is responsible for are deliberately small and tightly scoped, and the reasoning behind that constraint is architectural rather than conservative.

“It is much easier to design any agentic system where the individual tasks that one agent fulfills are small and validatable,” Ziegler explains. “So instead of sending one agent to go on a moonshot mission where they have a much too high quiver of arrows and a large chance to be lost, our aim is to have several small hops where the agents can be much more targeted, only get the tools that they actually need, and can have their results immediately verified.” The principle is the same one that makes microservices more maintainable than monoliths. Smaller scope means clearer failure modes, faster debugging, and results that can be verified before they influence the rest of the system.

A typical assessment begins with a small number of agents mapping the attack surface of the target asset. They have been told by the customer, who has manually verified ownership of the asset, to explore what the attack surface looks like and make preliminary notes about which parts warrant closer attention. That initial mapping feeds into a more specialized layer of work. One agent logs into the application and maintains an authenticated session, providing that session as a shared service to the other agents that need authenticated access rather than having each agent manage its own authentication state independently. This service architecture prevents the coordination overhead that would otherwise accumulate when multiple agents are individually trying to maintain valid sessions against a target that may rate-limit or lock accounts on repeated authentication attempts.

The attack agents themselves are given a specific methodology and a specific place on the attack surface to work against. An agent assigned to look for cross-site scripting vulnerabilities, for example, is not exploring the application broadly. It is applying a defined methodology to a defined target, iterating through variations over ten to forty attempts, and refining its approach based on the responses it receives from the application, whether through direct HTTP responses or through a headless browser that can observe what the page actually renders. The iteration loop is tight and observable, which makes it possible to monitor what the agent is doing and stop it if something looks wrong before it goes further.

The Validation Layer That Prevents False Findings

The step that separates a finding from a false positive in XBOW’s architecture is not the attack agent’s confidence in what it found. It is a separate validation agent whose sole job is to verify that the finding is real before it enters the system as a confirmed vulnerability. Ziegler describes the handoff with the specificity that makes it clear why this step is not optional in a system operating at this scale.

When an attack agent believes it has found something, it passes the finding to a validation agent along with the evidence it has been told to gather for that type of exploit. For a cross-site scripting finding, the attack agent would tell the validator that navigating to a specific URL in a browser would cause a pop-up to appear containing a specific string. The validator then checks this automatically, independent of the attack agent’s assessment. If the pop-up appears, the finding proceeds to the next stage. If it does not, the finding is discarded rather than carried forward as a probable or likely vulnerability that a human reviewer needs to assess.

“Agents often believe they have found something because the models are trained to please,” Ziegler explains. That observation deserves more attention than it typically receives in the broader conversation about deploying AI agents in high-stakes environments. A model that is rewarded for task completion will tend toward confidence about whether it has completed the task, and in a security context that means an agent that has not found a vulnerability will still tend toward interpreting ambiguous signals as evidence that it has. The validation layer is the architectural response to that behavioral property. It removes the agent’s assessment from the chain of custody for a finding and replaces it with an independent check that cannot be influenced by the agent’s confidence in its own work.

Findings that pass validation go through additional administrative checks for replicability and minimal reproduction steps before they are assessed for severity and delivered to the customer. The chain from initial probe to customer report is fully automated and fully verified at each handoff, which is what makes it possible to run at the scale that human-driven pen testing cannot approach.

Governance When the Agent Has Offensive Capability

The governance question for autonomous penetration testing systems is more demanding than the governance question for most other agentic applications because the capability being governed is inherently dangerous when misdirected. An agent that can find and demonstrate exploits in a target system can also, if its controls fail, find and demonstrate exploits in systems it was not authorized to test, or cause harm to the system it was authorized to test in ways that were not part of the assessment scope.

Ziegler’s framework for addressing this organizes around the same motive, method, and opportunity triad he has developed from his experience running assessments and thinking carefully about what it means to give AI systems offensive capabilities in controlled environments. All three conditions have to be addressed because eliminating only two leaves a residual failure path that a bad combination of circumstances can eventually activate.

Motive is addressed by designing the agent’s goals in a way that excludes harmful outcomes from the valid solution space. In the XSS example, the agent’s goal is to prove that a cross-site scripting vulnerability exists by making a pop-up appear in a controlled test. The goal is not to change application state, exfiltrate data, or cause any modification to the target that persists after the assessment. An agent pursuing that goal faithfully cannot cause the kind of harm that a broader mandate would enable, because the goal itself does not point toward it.

Method is addressed by giving agents only the tools their specific task requires. An agent assigned to probe for XSS vulnerabilities does not have the tools to modify server configurations, access databases, or reach endpoints that are outside the scope of its assigned methodology. The tool set is the boundary, and the boundary is set at the task level rather than the agent level.

Opportunity is addressed through real-time controls that operate on every action before it executes. “Every action that an agent takes will be checked by a safety model, as well as checked for certain keywords that could indicate the intent to harm, as well as controlled via egress,” Ziegler explains. Network egress is restricted to the customer’s verified assets, and even within those assets, sensitive endpoints that the customer has flagged as out of scope are excluded from what the agents can reach. The opportunity controls do not rely on the agent making the right decision about what to do with the access it has. They enforce the boundary regardless of what the agent decides.

What Changes When Governance Is Built for Agents

Ziegler draws a line between the governance model he built at GitHub for Copilot and the model he has had to build at XBOW for autonomous agents, and the contrast clarifies why most organizations that are currently deploying coding agents and testing agents with governance frameworks designed for advisory AI are going to encounter the gap eventually rather than avoiding it.

GitHub Copilot, at least in the version Ziegler worked on, was an AI-first product where a human was always in the loop and the primary risk was a wrong suggestion. Wrong suggestions are annoying and occasionally costly, but they do not cause harm in the way that agent actions can cause harm. The human reviewer was the safety layer, and the governance architecture could be relatively simple because the most dangerous thing the system could do was provide a misleading recommendation that a human then chose to act on.

Autonomous agents face a categorically different risk profile. “The risk is not that their suggestions are annoying but that their actions are harmful,” Ziegler argues, “and that needs truly new ways of addressing them that cannot just be written off as there is a human in the loop and they will stop anything problematic.” The human in the loop did enormous safety work in the advisory model, and that work has to be redistributed into the architecture itself when the loop operates at a speed and scale that human oversight cannot follow in real time. The governance controls are not a feature added on top of the system. They are the system’s safety guarantee, and they have to be designed with the same rigor and the same adversarial mindset that goes into designing the offensive capabilities they are containing.

The organizations building autonomous security testing capabilities today are navigating a design space that has no established playbook and limited precedent. What XBOW’s architecture demonstrates is that the constraints, the small scoped tasks, the validated handoffs, the independent verification layer, and the motive-method-opportunity governance triad, are not limitations on what autonomous agents can accomplish. They are the conditions under which autonomous agents can accomplish something reliably and at a scale that changes what continuous security coverage actually means in practice.

THIS WEEK’S PERSON OF INTEREST

George Hotz — Hacker, iPhone jailbreaking pioneer

George Hotz, iPhone jailbreaking pioneer, is making a self-driving car | The Verge — Source: The Verge

George Hotz jailbroke the iPhone at seventeen, reverse-engineered the PlayStation 3, hunted zero-days at Google’s Project Zero, and built comma.ai’s open-source autonomous driving software to 100 million miles driven.

This month he challenged Anthropic and OpenAI directly, threatening to release a zero-day a day until a major new model dropped. His argument was simple: software vulnerabilities are not scarce because they are hard to find. They are scarce because finding them legally is restricted. Reports of AI-assisted exploit discovery costing $20,000 in compute drew a dismissal from Hotz, who said he would do it for less.

That argument matters here because it names a tension the rest of this issue circles. When autonomous agents can find and chain exploits at scale, the question of whether AI labs are raising legitimate dual-use concerns or using safety language to shape regulation becomes a live policy debate. Hotz has the track record to make that challenge credible.

SECURITY BRIEFS

State-linked intrusions, supply chain compromises, and AI-assisted attacks from the recent weeks.

Puspita Pradhan

Business Analyst & Researcher

Technical Contributor for Offensive Engineering (by infosecrelations.com)

Discussion about this post

Ready for more?