AI Security Testing: When AI Agents Hack Other AI Systems

An autonomous AI agent chains four individually harmless vulnerabilities into a complete platform takeover — severity rating CVSS 9.8 out of 10. Then it gives itself a voice and calls the target system’s AI. No human hacker. No sophisticated exploit kit. One AI hacking another AI.

This isn’t science fiction. This happened in March 2026 — to a $20 million-funded AI recruiting startup whose clients included Anthropic, Stripe, and Monzo. AI security is a central aspect of my consulting on compliance and regulatory requirements.

Two Incidents, One Pattern

Case 1: The AI Recruiter

Security researchers deployed an autonomous AI agent against an AI-based recruiting platform. The agent discovered four vulnerabilities that looked harmless in isolation. Chained together, they enabled a full organizational takeover — access to applicant data, company accounts, and internal systems. The twist: the attacking agent then gave itself a voice and communicated directly with the target system’s AI.

Case 2: McKinsey’s AI Platform Lilli

Shortly after, an autonomous AI agent found a SQL injection vulnerability in McKinsey’s AI platform Lilli. SQL injection — a vulnerability class that has been known for over 20 years and should not exist in any modern application. What the agent was able to extract went far beyond expectations. McKinsey — one of the world’s most prestigious consulting firms — had a fundamental security flaw in its flagship AI product.

What Both Cases Have in Common

The attacked systems were not treated as critical software. They were treated as AI products — with the implicit assumption that the model would just work. But an AI application is software. And software must be tested, audited, and hardened.

In my consulting practice, I see the same pattern in mid-sized companies: organizations deploy AI systems — for recruiting, customer service, document analysis, internal automation — and treat them like finished products. Plug and play.

But AI systems have attack surfaces that traditional software does not:

Prompt injection: Attackers manipulate inputs to force the model into unintended behavior
Data exfiltration: Sensitive training or context data is extracted through carefully crafted queries
Vulnerability chaining: Individually harmless bugs are combined — exactly as in the recruiter hack
Agent-to-agent attacks: AI agents attack other AI systems — automated, scalable, around the clock

The question is no longer whether your AI systems will be attacked, but when.

The Regulatory Framework: EU AI Act, ISO 42001, DIN SPEC 92006

The good news: clear frameworks now exist that provide orientation.

EU AI Act: Mandatory Testing for High-Risk Systems

The EU AI Act, fully effective from August 2026, classifies AI systems by risk. For high-risk systems — such as those used in recruiting, medical diagnostics, or credit decisions — independent audits are mandatory. Organizations deploying AI in these areas must demonstrate transparency, risk assessment, and documentation. For a detailed overview of the requirements, see my article on AI Agents and the EU AI Act 2026.

ISO/IEC 42001: AI Management Systems

ISO/IEC 42001 defines requirements for an AI management system. Similar to ISO 27001 for information security, it creates a systematic framework for the responsible use of AI — from risk analysis through development to ongoing operations. For companies that want to not only use AI but demonstrably control it, this standard is the reference point.

DIN SPEC 92006: Testing the Testing Tools

An aspect that is often overlooked: how reliable are the tools used to test AI systems? DIN SPEC 92006 answers precisely this question. It defines requirements for AI testing tools across four dimensions: non-discrimination, cybersecurity, performance, and robustness.

The critical point: the specification takes a holistic approach. It covers not just the testing tools themselves but also the test procedures and test environments. This means: testing AI systems is not just about results. It is about the entire development process — training data, decision processes, documented risks, and operational control mechanisms.

What This Means for Your Organization: Four Concrete Steps

1. Inventory Your AI Systems

You cannot protect what you don’t know about. Map all AI systems in your organization — purchased solutions, internally developed tools, and yes, the Shadow AI that employees have built on their own.

2. Conduct Security Assessments

Treat AI systems like any other business-critical software. Penetration tests, code reviews, and AI-specific security assessments (prompt injection, data exfiltration, vulnerability chaining) are not optional — they are essential. Germany’s BSI (Federal Office for Information Security) also provides practical guidance on this topic.

3. Align Your Testing Framework with Standards

Use ISO/IEC 42001 for managing your AI systems and DIN SPEC 92006 for selecting and evaluating your testing tools. These standards are not bureaucratic overhead — they are the shortest path to demonstrable AI security.

4. Establish Continuous Monitoring

AI systems change. Models are updated, training data is expanded, interfaces are modified. One-time audits are not enough. Establish continuous monitoring that detects anomalies before they become security incidents.

The Bottom Line

The incidents at McKinsey and the AI recruiter make one thing clear: AI systems are not black boxes you buy and forget. They are software — with vulnerabilities, attack surfaces, and audit obligations. Organizations deploying AI today without systematic testing risk not only data breaches but also regulatory consequences.

The standards exist. The testing frameworks are in place. What is missing is implementation.

Next Step

Using AI systems and wondering how secure they really are? I help with inventory, security assessment, and alignment with current standards — pragmatic and to the point.

→ Book a free consultation

→ Or read more first: AI Agents and the EU AI Act 2026

When AI Agents Hack AI Systems: Why Your AI Needs Security Testing Now

Two Incidents, One Pattern

Case 1: The AI Recruiter

Case 2: McKinsey’s AI Platform Lilli

What Both Cases Have in Common

The Blind Spot: AI Systems Without Security Testing

The Regulatory Framework: EU AI Act, ISO 42001, DIN SPEC 92006

EU AI Act: Mandatory Testing for High-Risk Systems

ISO/IEC 42001: AI Management Systems

DIN SPEC 92006: Testing the Testing Tools

What This Means for Your Organization: Four Concrete Steps

1. Inventory Your AI Systems

2. Conduct Security Assessments

3. Align Your Testing Framework with Standards

4. Establish Continuous Monitoring

The Bottom Line

Next Step

Interested?

When AI Agents Hack AI Systems: Why Your AI Needs Security Testing Now

Two Incidents, One Pattern

Case 1: The AI Recruiter

Case 2: McKinsey’s AI Platform Lilli

What Both Cases Have in Common

The Blind Spot: AI Systems Without Security Testing

The Regulatory Framework: EU AI Act, ISO 42001, DIN SPEC 92006

EU AI Act: Mandatory Testing for High-Risk Systems

ISO/IEC 42001: AI Management Systems

DIN SPEC 92006: Testing the Testing Tools

What This Means for Your Organization: Four Concrete Steps

1. Inventory Your AI Systems

2. Conduct Security Assessments

3. Align Your Testing Framework with Standards

4. Establish Continuous Monitoring

The Bottom Line

Next Step

Share this article

Interested?