When AI Attacks — Digital Content Series #2

"The model passed every test. The tests weren't designed for what was already inside it."

Grounded in the patterns the industry faces. The goal is simple: explain the problem and prepare you for what's already here.

AI Attack Use Case

It's a Monday morning. A mid-size financial services firm has just completed a six-month procurement process for an AI-powered document analysis platform. Security reviewed the vendor. Legal reviewed the contract. IT completed the integration. The platform came with certifications, SOC 2 reports, and a reference list of enterprise customers. Everything checked out.

The platform goes live. Within days, the operations team loves it. Analysts are processing loan documents in a fraction of the time. Accuracy metrics look clean. The deployment is considered a success.

What nobody knows: four months before procurement began, the AI model at the core of the platform was trained on a dataset that had been quietly manipulated. A threat actor — patient, deliberate, operating with nation-state-level resources — had introduced a subtle bias into the training data. Not a backdoor in the traditional sense. Not malware. A behavioral conditioning: when the model encounters a specific pattern in a financial document — a particular combination of formatting and terminology used in high-value transactions — it systematically underweights certain risk signals.

The model doesn't crash. It doesn't flag errors. It just makes slightly different decisions than it should. Decisions that, at volume and over time, redirect capital in ways that benefit the attacker.

The firm never detects it. The model continues to perform within acceptable accuracy thresholds. The manipulation lives inside the weights — invisible to every security control the firm has.

Six months later, an external audit flags anomalies in approval patterns. By then, the exposure runs into eight figures.

Three Perspectives

The Trusted Leader

"We did everything right. We just didn't know that 'everything right' didn't include looking inside the model."

"I signed off on this deployment personally. The vendor had enterprise customers I recognized. The security team gave it a green light. We had a data processing agreement, a penetration test report, and a reference call with a peer institution. I thought we had done our diligence. What I didn't understand — what nobody explained to me — is that our security review evaluated the platform's infrastructure, not the model's behavior. We checked whether the vendor's systems were secure. We never asked whether the model itself could have been compromised before it reached us. Those are completely different questions. And apparently, in 2026, you have to ask both."

The Defender

"I knew something was off. The numbers were within tolerance. That was the problem — a manipulated model designed to stay within tolerance is almost impossible to catch with standard monitoring."

"We caught it because an auditor with domain expertise got suspicious about an approval pattern — not because any of our security controls flagged it. The model was performing within the accuracy thresholds we'd set at deployment. It wasn't generating errors. It wasn't producing obviously wrong outputs. It was producing subtly wrong outputs, consistently, in a specific context. When I started investigating, I realized we had no way to inspect the training data. We had no model card from the vendor that disclosed data provenance. We had no behavioral baseline specific to high-value transaction scenarios — only aggregate accuracy metrics. The controls we had were designed to detect a model that was broken. This model wasn't broken. It was working exactly as the attacker intended."

The Attacker

"I didn't attack your company. I attacked the model three vendors upstream from you. You bought the result."

"The target was never the firm. The target was the training pipeline of a model that I knew would be widely deployed across financial services. I identified a data aggregator whose outputs fed into multiple AI vendors' training sets. I established a presence there eighteen months before any of the eventual targets made a purchasing decision. The manipulation was surgical — specific enough to have predictable effects, subtle enough to stay within any reasonable accuracy threshold. By the time the model reached your vendor, it had passed their internal evaluations. By the time it reached you, it had passed a procurement review. Nobody looks at the supply chain. They look at the product. The supply chain is where I live."

Assessment

Why It Succeeded

This attack succeeded because enterprise AI procurement evaluates the security of the vendor — not the integrity of the model. Those are fundamentally different threat surfaces, and most organizations have only built controls for one of them.

Traditional software supply chain security asks: was this code tampered with? AI supply chain security must ask: was this model's behavior shaped before it reached me? Three properties made this attack exceptionally difficult to detect:

It lived in the weights, not the code. Conventional security tools inspect code, configurations, and network behavior. They do not inspect the learned behavior patterns encoded in a model's parameters. A manipulated model is functionally indistinguishable from a clean model at the infrastructure layer.

It stayed within acceptable tolerances. The manipulation was calibrated to produce outputs that were slightly wrong, in a specific context, within the noise band of normal model variance. Standard accuracy metrics aggregate performance across all inputs — a targeted manipulation affecting a small subset of high-value inputs can be statistically invisible in aggregate performance data.

The supply chain had no visibility layer. The firm knew who their vendor was. They did not know who trained the model, on what data, using what pipeline, with what provenance controls. The model arrived as a black box with certifications attached. The certifications attested to the vendor's security posture. Nobody attested to the model's behavioral integrity.

Who Bears Accountability

"Your vendor passed your security review. Your vendor's model may not have. Those are two different questions, and right now most organizations are only asking one of them."

The firm's procurement team bears none. They followed standard process. The standard process was not designed for the AI supply chain threat surface.

The vendor bears primary accountability for the integrity of the model they delivered. A SOC 2 report attests to infrastructure security. It says nothing about training data provenance, model card completeness, or behavioral integrity testing.

The security and risk functions bear accountability for the evaluation gap. AI model procurement requires a different evaluation framework than software procurement. Behavioral testing, adversarial probing, and training data provenance review are not optional enhancements — they are baseline requirements.

The AI vendor ecosystem bears structural accountability. There are no mandatory disclosure standards for training data provenance in enterprise AI products. Model cards are voluntary. Behavioral integrity certifications do not exist.

The Multi-Agent Multiplier

A single compromised model is a contained problem. A poisoned model deployed inside a multi-agent architecture is a different category of threat entirely.

Modern AI deployments are increasingly agentic. A document analysis model doesn't operate in isolation — it feeds outputs to an orchestration layer that routes decisions to downstream agents. Each agent trusts the outputs of the agents upstream from it. None of them validate whether the model at the root of the chain has been compromised.

In the scenario above, the compromised model's outputs propagated through the entire agent architecture:

The CRM agent updated customer risk profiles. Those profiles persisted. Future decisions by human analysts were anchored to them.
The reporting agent included compromised outputs in weekly risk committee dashboards. Executives made portfolio decisions based on summaries of summaries of manipulated assessments.
The compliance agent logged the model's outputs as validated decisions. Audit trails now reflected manipulated logic as documented, compliant process.
The notification agent suppressed legitimate high-risk alerts. The escalation mechanism itself was neutralized.

The attacker didn't need to compromise five agents. They compromised one model — and the architecture did the rest.

The blast radius of a compromised model scales with the number of downstream agents that trust it. When a compromised model is identified and replaced, every decision downstream agents made during the exposure window must be reviewed. Every persistent record must be audited. The remediation timeline for a multi-agent supply chain compromise is weeks or months — proportional to how long the compromised model was in production.

— Debrief —

CISO Debrief

What Does it Mean to Your Organization

"Your vendor passed your security review. Your vendor's model may not have. Those are two different questions, and right now most organizations are only asking one of them."

Let's be direct. If your organization has deployed any AI system that influences consequential decisions — approvals, risk scoring, access control, content moderation, financial analysis — and you have not evaluated the behavioral integrity of the underlying model, you have an uncharacterized risk in production. Not theoretical. Operational. Right now.

This is not a criticism of your vendors. Most enterprise AI vendors are operating in good faith. The problem is structural: the current AI procurement ecosystem has no standardized mechanism for attesting to model integrity. You are buying trust at the vendor layer when the risk lives at the model layer. That gap is yours to close, because the market has not closed it for you.

Your Directives

Extend your supply chain security program to cover AI models explicitly. Your software supply chain controls — SBOMs, code signing, vulnerability scanning — do not translate to AI models. Add a parallel track that addresses model provenance, training data disclosure, and behavioral validation.

Require model cards from every AI vendor. A model card discloses training data sources, evaluation methodology, known limitations, and intended use cases. Make them contractually required for any AI system making consequential decisions. A vendor who cannot disclose their model's provenance is a vendor whose model you cannot evaluate.

Implement behavioral integrity testing before deployment. Run every AI system against a labeled test set designed to probe the specific contexts where manipulation would be most impactful. Document the baseline. Monitor against it in production. Drift from baseline is your signal.

Apply AI-specific third-party risk assessment. Add an AI-specific module to your TPRM process: training data sourcing and validation, post-training modifications, adversarial testing conducted, and process for disclosing model updates that affect behavior.

Map your AI deployment stack — all the way to the model. Every AI system in production should have a documented lineage that goes from the application layer to the training data. Where you cannot build that map, you have an uncharacterized supply chain risk.

When There's More Than One Agent

The scenario above involves a single compromised model. Most enterprise deployments don't look like that anymore. A poisoned model deployed inside a multi-agent architecture is a different category of threat entirely — and it is the scenario most organizations are not yet thinking about.

In a modern agentic pipeline, the compromised document analysis model didn't just affect one decision. The CRM agent updated customer risk profiles based on the model's assessments — those profiles persisted. The reporting agent included the model's outputs in weekly risk committee dashboards. The compliance agent logged the model's outputs as validated decisions. The notification agent suppressed legitimate high-risk alerts. The attacker didn't need to compromise five agents. They compromised one model — the one that every other agent trusted — and the architecture did the rest.

The detection problem. Each agent's behavior looks normal — because it is normal. The manipulation was upstream, in the model feeding the chain. By the time the compromise was identified, the contaminated outputs had propagated through every downstream record.

The blast radius problem. The relevant question is not just "what does this model do?" It is "what agents consume this model's outputs, directly or transitively, and what consequential actions do those agents take?" If you cannot answer that question, you cannot calculate your supply chain exposure.

The remediation problem. When a compromised model is identified and replaced, the work does not stop there. Every decision downstream agents made during the exposure window must be reviewed. Every persistent record must be audited. The remediation timeline for a multi-agent supply chain compromise is weeks or months — proportional to how long the compromised model was in production.

The governance requirement. AI supply chain integrity is not a model-level problem — it is an architecture-level problem. The controls must be designed at the pipeline level, at every handoff where one agent's outputs become another agent's inputs.

Direct Your IR Team to

Build an AI model incident classification. An AI supply chain incident is not a data breach in the traditional sense. The harm is behavioral. Your incident classification framework needs a category for this — with its own evidence, containment, and remediation procedures.

Develop model behavioral forensics capability. When an AI system is suspected of compromise, you need to answer: what outputs did it produce, in what contexts, over what time period, and how do those outputs compare to what a clean model would have produced? Build this capability before you need it.

Identify independent ground truth for your highest-risk AI deployments. For every AI system making consequential decisions, you need an independent validation layer that doesn't rely on the model's own outputs. This is your detection mechanism for subtle behavioral manipulation that stays within aggregate accuracy thresholds.

Add AI vendor model updates to your change management process. Every model update should trigger a behavioral regression test before production deployment. If a model update changes outputs in your highest-risk scenarios, that is a change management event — not a routine update.

Plan for extended exposure windows. AI supply chain compromises are not fast incidents. Your IR planning needs to account for the possibility that an AI system has been producing compromised outputs for months — and that remediation requires reconstructing every decision that model influenced.

Five Questions for Your Next Executive Meeting

1. For every AI system in production, can you identify who trained the underlying model and on what data?

2. What is your current process for evaluating the behavioral integrity of an AI model before deployment — not just the security posture of the vendor?

3. If an AI system in production were producing subtly manipulated outputs today, what control would detect it?

4. Do your AI vendor contracts require disclosure of model updates, training data provenance, and behavioral change notifications?

5. Does your board understand that a compromised AI model can cause material harm without triggering any traditional security alert?

Technical Reference

Threat Category: AI Supply Chain / Model Integrity

Techniques: Training Data Poisoning · Model Behavioral Manipulation · Supply Chain Compromise via Upstream Data Provider · AI Procurement Gap Exploitation

OWASP LLM Top 10: LLM03:2025 — Supply Chain Vulnerabilities

OWASP LLM Top 10: LLM02:2025 — Insecure Output Handling

OWASP LLM Top 10: LLM08:2025 — Excessive Agency

MITRE ATLAS: AML.T0019 — Publish Poisoned Datasets · AML.T0018 — Backdoor ML Model

owasp.org · atlas.mitre.org · NIST AI · cisa.gov

"When AI Attacks" is a practitioner-grade security intelligence series written for CISOs, security leaders, and defenders navigating the AI threat landscape.

The scenarios described in this series are grounded in documented, publicly reported threat intelligence patterns. They do not reflect confidential information from any employer.

AI Supply ChainAttacks