Offensive Security Evolves With Continuous AI and Automation

Offensive Security Evolves With Continuous AI and Automation

Today we’re joined by Matilda Bailey, a leading networking specialist whose work provides a unique lens on the future of cellular, wireless, and next-generation security solutions. With malicious attacks growing in frequency and sophistication, the pressure on organizations to find and fix vulnerabilities has never been higher. Matilda is here to unpack the seismic shifts happening in offensive security, exploring how the roles of red and blue teams are merging, the profound and complex impact of artificial intelligence, and what the future of proactive defense looks like. We’ll delve into the practical challenges of this evolution, from redesigning social engineering tests to maintain morale to navigating the regulatory hurdles of AI-generated reports. This conversation will illuminate how security leaders can move beyond traditional, siloed approaches to build a more resilient, continuous, and integrated defense strategy.

The traditional model has red teams finding flaws and blue teams fixing them. As this divide blurs, what does effective collaboration look like in practice? Could you share a step-by-step example of how red teams can guide remediation without “owning” the fix?

It’s a foundational shift, moving away from the old “throw it over the wall” mentality. That crumbling wall between red and blue teams is one of the most significant changes we’re seeing. In a practical sense, collaboration becomes a continuous feedback loop rather than a series of handoffs. For instance, a red team might start by simulating a known threat actor’s TTPs. Instead of just dropping a 100-page report on the blue team’s desk, they’d immediately sit down with them. The first step is joint triage, where the red team explains how they got in, showing the attack path. The conversation isn’t “Here’s a vulnerability,” but “Here’s the story of the breach.” The second step is guided remediation, where the red team advises on the most effective fix, maybe suggesting a configuration change over a patch that could break something else. They don’t write the code, but they act as expert consultants. Finally, and this is crucial, the red team re-tests the specific fix immediately. This validates the solution in near real-time, ensuring the window of exposure is closed and the blue team’s effort was effective. It’s a partnership, not a simple transaction.

With organizations adopting hybrid red team models—using in-house teams for ongoing operations and external partners for fresh perspectives—what metrics determine when to bring in an external team? How do you integrate their findings with the institutional knowledge of the internal team?

That hybrid model is definitely the emerging best practice for mature organizations. The decision to bring in an external team isn’t just about a calendar date; it’s triggered by specific needs. A key metric is the ‘novelty decay’ of your internal team’s findings. If your in-house team starts finding similar types of vulnerabilities exercise after exercise, it’s a sign they’ve developed institutional blind spots. That’s a trigger. Another trigger is a major technology shift, like a migration to a new cloud provider or the adoption of a new AI platform. Your internal team may not have the specialized expertise yet. Finally, compliance is a huge driver; many frameworks require an unbiased, third-party assessment annually. Integrating their findings is about fusion, not replacement. The external report becomes a new dataset for the internal team. They don’t just fix the issues; they analyze the why. Why did the external team find this when we didn’t? Was it a new tool, a different perspective, a blind spot in our monitoring? That external report enriches the institutional knowledge and refines the internal team’s strategy for the next quarter.

AI’s potential for generating novel attack ideas through “hallucinations” is a unique advantage. How can security leaders harness this creative, non-linear capability in their red team exercises? Please describe a scenario where this approach might uncover a vulnerability that a human would likely miss.

This is one of the most fascinating aspects of AI in offensive security. We often see AI “hallucinations” as a bug in chatbots, but in red teaming, they are a powerful feature. It’s about harnessing AI’s ability to think in parallel and try thousands of unconventional paths that a human, who thinks more linearly, would never attempt. To harness this, a leader can set up a workflow where an AI agent is tasked with a broad objective, like “achieve domain admin,” but given creative freedom on the method. Imagine a scenario where a company has a well-secured external perimeter but a legacy internal application with a convoluted API. A human tester might follow a logical path, testing for known API vulnerabilities like injection or broken authentication. They get tired, they follow patterns. An AI, however, might “hallucinate” a bizarre, multi-step chain of events. It could try fuzzing an obscure API endpoint with seemingly nonsensical data, which happens to trigger a memory leak in a connected, completely different service. This leak then exposes a temporary credential that allows it to pivot to a third system, ultimately finding a path to a domain controller. A human would likely never discover this convoluted, three-step exploit chain because it defies conventional logic. The AI isn’t finding a single bug; it’s creating a superpower by chaining together minor, unrelated weaknesses.

Social engineering simulations are critical but can damage morale. How can offensive security teams design these tests to build a stronger security culture rather than just identifying “soft targets”? Please provide specific examples of harmless but effective simulation tactics.

This is a delicate balance, and the goal must always be education, not humiliation. Shifting the focus from identifying “soft targets” to building a company-wide culture of healthy skepticism is the key. The worst thing you can do is make employees feel tricked or foolish, which just breeds resentment. The most effective simulations are harmless but illustrative. For example, instead of a phishing email with a fake invoice that causes panic, you could send one offering early access to a new, exciting internal tool. Clicking the link wouldn’t lead to a “You failed!” page, but to a short, engaging micro-training module: “You’re eager to innovate, and we love that! Here’s how attackers exploit that enthusiasm.” Another tactic is a benign vishing (voice phishing) call where the “attacker” claims to be from IT and asks the employee to read back a six-digit code from a public-facing page on the company website. It tests their willingness to comply without actually compromising any sensitive data. The follow-up is immediate and supportive, reinforcing that the goal is collective resilience, not individual failure.

Given that over-reliance on AI could diminish human expertise, what specific strategies should organizations implement to ensure their security professionals’ skills evolve alongside AI’s capabilities? What does a successful human-machine symbiosis look like in a red team engagement?

This is the central challenge of the next decade: preventing the atrophy of human skills. The strategy must be to position AI as a force multiplier, not a replacement. One concrete strategy is to redefine roles. Junior analysts might use AI to automate the reconnaissance and vulnerability scanning—the tedious work—freeing them up to spend time learning advanced tactics from senior members. A second strategy is to invest heavily in continuous training on how to manage the AI. The new critical skill isn’t just finding a vulnerability, but being able to question the AI’s output, validate its findings, and direct its “creativity” in a productive way. A successful human-machine symbiosis in a red team engagement looks like this: The AI agent runs 24/7, continuously probing the network and generating thousands of potential attack paths. The human expert then steps in, not to redo the work, but to provide strategic oversight. They review the AI’s most promising findings, use their intuition to discard the false positives, and then personally execute the final, most complex stages of the attack that require contextual understanding and human ingenuity. The machine provides the scale, and the human provides the strategy and the final, nuanced execution.

Some experts warn that AI-generated pentesting reports may not satisfy regulatory requirements that mandate human involvement. How should organizations navigate this compliance challenge? What practical steps can they take to document and validate the AI-assisted work to meet legal standards?

This is a very real and immediate concern, especially in regulated industries. The notion of a penetration test in law is still firmly rooted in the involvement of a qualified human expert. Simply handing a regulator a report generated entirely by an AI tool could be seen as non-compliant and even lead to penalties. The key to navigating this is meticulous documentation and a “human-in-the-loop” validation process. First, organizations must treat the AI as a tool, not the tester. The final report should be authored and signed by a human expert who attests to the findings. Second, they must maintain a detailed audit trail. This log should document every major step the AI took, the raw data it produced, and, critically, the human analysis of that data. For every vulnerability the AI flags, a human expert needs to independently validate it, document their validation steps, and assess the business context and impact—something AI still struggles with. This creates a defensible paper trail that proves a qualified human expert directed, reviewed, and ultimately owned the entire testing process, satisfying the spirit and letter of the law.

As offensive and defensive security begin to merge into a single, continuous cycle, what are the primary operational and cultural shifts a company must undergo? Please describe the top three challenges leaders face when breaking down these traditional silos.

The convergence of red and blue teams into a continuous cycle is the logical endpoint of security evolution, but getting there requires profound change. The first and biggest challenge for leaders is cultural inertia. These teams have been separate, sometimes even adversarial, for decades. Breaking down those “ivory towers” means redefining identities and success metrics. A blue teamer’s job is no longer just blocking tickets; it’s about active collaboration and threat hunting. A red teamer’s job isn’t just to “win” by getting root access; it’s to improve resilience. The second challenge is operational integration. You can’t just put everyone in the same room. It requires building shared workflows and investing in platforms that provide a single source of truth for both teams, so they are seeing the same data in real-time. The final challenge is the skills gap. This new model requires professionals who are more adaptable than ever. They need a working knowledge of both offense and defense—the so-called “purple” professional. Leaders face the daunting task of upskilling their existing workforce to navigate cloud systems, IoT, and AI tools, moving beyond mastery of a single domain. It’s a massive shift from isolated specialists to integrated, adaptable security generalists.

What is your forecast for offensive security?

My forecast is that offensive security is in the midst of its most significant transformation in over a decade, and what we’ll see by 2026 is the true convergence of offense, defense, and automation. The boundary between red teaming and blue teaming will effectively dissolve, replaced by a continuous cycle of AI-driven tools constantly probing systems and hardening them in the same workflow. We’ll move from periodic, snapshot-in-time exercises to a state of permanent, pre-emptive validation. AI agents will handle the bulk of automated testing, operating with a speed and scale that humans simply cannot match, while human experts will ascend to a more strategic role—directing these AI systems, interpreting their most creative findings, and handling the complex, context-aware attacks that still require human intuition. The job will be less about finding a single bug and more about understanding and securing an entire, interconnected system against an adversary that is also using AI. It will be a faster, more integrated, and relentlessly proactive future.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later