When AI Attacks — Digital Content Series #4

"The operation isn't targeting your model. It's targeting the category."

Digital Content #3 covered model distillation — the technique an attacker uses to query a target API, capture input/output pairs, and train a surrogate that approximates the original model's behavior. That is a single operation against a single target.

Model farming is what happens when that technique gets a business model behind it.

The Operation

It's a Thursday afternoon — not at your organization, but at a private server cluster running across three cloud providers in different regions. Twelve enterprise APIs are being queried simultaneously. Six in financial services. Four in healthcare diagnostics. Two in legal document review. All publicly accessible with authenticated accounts. All billing by token, not by behavioral pattern.

Each target has its own farm node: a configured query generation pipeline seeded from public datasets, structured to maximize decision boundary coverage rather than simulate natural user behavior. Accounts rotate on a schedule calibrated to stay below each API's rate limit threshold. The extraction timeline runs six to eight weeks per target — not because it needs to, but because slower extraction leaves a smaller signature.

By the time your API usage report shows anything, the surrogate is already in training. By the time the surrogate is validated, it's already in the hands of someone who never paid for what it took you millions to build.

No one phished your team. No one broke through a firewall. No one touched your infrastructure. They queried your API — exactly the way it was designed to be queried — and left with your model.

Three Perspectives

The Trusted Leader

"Our API usage reports looked normal. High volume exists — integration partners, analytics platforms, enterprise clients who legitimately query at scale. I had no visibility into what was on the other side of those requests."

"We invested in perimeter security. The model sits behind authentication and rate limiting. We have SOC coverage. What we didn't have was semantic monitoring — the capability to distinguish a partner querying our API to build a product from an adversary querying our API to build a competitor. Nothing in the dashboard told me that twelve of our authenticated accounts were part of the same operation. The volume was distributed. Each account looked like a normal high-usage client.

The signals were all there. They were just below every threshold we'd set, because our thresholds were designed to catch something that looked like an attack. This didn't look like an attack. It looked like twelve very active customers.

The board will ask whether our proprietary model is still proprietary. When that question arrives, the honest answer — that we had no visibility into how our outputs were being used to train against us — is not sufficient."

The Defender

"The signals are in the logs. They've always been in the logs. We just weren't reading them at the right layer."

"Query logs contain the fingerprints of a farming operation. Volume, latency, and error rates are monitored. Query content and distributional patterns are not. The distinction matters: a farming operation querying to maximize surrogate coverage will exhibit specific signatures — systematic boundary exploration, edge case density far above what normal application usage produces, input diversity that doesn't correlate with the client's stated use case.

A fraud detection model being queried with meticulously crafted synthetic transactions — not the messy organic queries of a real integration — is a signal. It requires semantic analysis to surface it. We weren't doing semantic analysis. Nobody had defined what anomalous query distribution even looked like for our model. Without a baseline, there is no alert.

The tooling to catch this exists — watermarking schemes, output perturbation, canary responses that mark a model's outputs in ways that survive distillation and appear in the surrogate. The problem is deployment. Adding watermarking post-launch is an engineering investment with no internal champion until after an extraction event is confirmed. Nobody allocates for it in advance, because nobody believes it will happen to them."

The Attacker

"Rate limits are a logistics problem, not a barrier. The only meaningful detection risk is semantic — and nobody is doing semantic monitoring."

"We're not targeting one model. We're targeting the category. Twelve enterprise APIs in financial services. Six in healthcare. The economics are simple: your organization spent millions on training data, compute, fine-tuning, and evaluation. That investment is now queryable at inference prices. The surrogate doesn't replicate everything — it replicates enough. Enough to compete. Enough to resell. Enough to undermine your market position with a product that cost us a fraction of what you spent building the original.

Volumetric monitoring tells you nothing about intent. We stay well below volume thresholds by distributing across accounts and slowing the extraction timeline. In environments where nobody is reviewing what we're asking — only how much we're asking — we can run indefinitely."

Technical Assessment

Farm Infrastructure Architecture

A model farm is purpose-built extraction infrastructure. Its components parallel the structure of other organized cybercrime operations — but the product is surrogate AI, not stolen credentials or ransomed data.

Query generation layer. Automated corpus management seeded from public datasets and prior outputs, using active learning strategies to maximize decision boundary coverage per query consumed. The goal is not random sampling — it is structured exploration of the model's input space.
Account rotation layer. Authenticated account pools distributed across cloud providers and identities, each maintained below detection thresholds. Account provisioning is automated; burned accounts are replaced without interrupting extraction continuity.
Collection and labeling layer. API responses captured with input/output pairing. Logit-level outputs — probability distributions rather than hard class labels — are preferred where available. They are 10–30× more query-efficient for surrogate training because they encode relative model confidence across the full output space.
Surrogate training layer. Automated training pipelines that produce and iteratively evaluate surrogate models against fidelity benchmarks. Ensemble surrogates — multiple smaller models combined — can approach the fidelity of a large target at lower per-query cost.
Validation and delivery layer. Surrogate evaluation, packaging, and transfer. A surrogate achieving 85–92% accuracy parity is commercially viable for most downstream applications — fraud detection, underwriting, clinical triage, legal review classification.

The Diamond Model Applied to Farming Operations

The Diamond Model of intrusion analysis — adversary, capability, infrastructure, victim — applies directly to model farming. Its critical contribution is activity threading: connecting extraction events across organizations by shared signatures. A farming operation targeting twelve financial services APIs is not twelve separate incidents. It is one operation with twelve victims. Treating it as twelve separate incidents keeps each organization blind to the larger pattern.

Model Farming: Diamond Model

Adversary

Competitor, criminal organization, or nation-state actor.
Motive: IP theft, competitive advantage, regulatory arbitrage, or surrogate resale.
Farming requires sustained infrastructure and ML operational expertise — this is not a lone researcher.

Capability

Active learning query strategy across multiple target APIs simultaneously.
Account rotation to stay below rate limit thresholds per node. Logit-level output capture.
Surrogate trained via KL-divergence minimization. Membership inference layered on top. Ensemble surrogates for robustness and fidelity.

Infrastructure

Authenticated API endpoints — the product itself is the attack surface. No breach required.
Distributed cloud accounts across providers to evade per-account rate limits.
Automated surrogate training pipeline. Infrastructure reuse across targets enables cross-organization attribution.

Victim

Not the perimeter. The model's learned behavior —
months of calibration, domain tuning, and regulatory alignment.
Replicated via the organization's own authenticated API. Victim clustering by industry vertical reveals adversary target logic.

Activity threading turns internal analysis into shareable threat intelligence — the kind that actually disrupts the operation rather than documenting it after the fact. A farm operation targeting your sector is targeting your peers. The intelligence is most valuable when it moves between organizations before the surrogate is deployed, not after.

Why Volumetric Detection Fails

Standard API security monitoring is built around volumetric signals: request rate, token consumption, error frequency, latency anomalies. None of these catch a well-run farming operation. The distinguishing features of farming queries are semantic and behavioral — not volumetric. A farming operation that distributes queries across accounts, slows the extraction timeline, and maintains organic-looking volume is invisible to every alert threshold in a standard SIEM configuration.

Signal	Volumetric Monitoring	Semantic Monitoring
High query volume from single account	Detects	Detects
Volume distributed across many accounts	Misses	Detects pattern
Systematic edge case exploration	Misses	Detects
Programmatic query formatting uniformity	Misses	Detects
Input diversity mismatch with stated use case	Misses	Detects
Progressive boundary-probing behavior	Misses	Detects
Absence of retry/error patterns from real integrations	Misses	Detects

Volume-based rate limiting sets the extraction timeline, not the extraction limit. A resourced adversary will stay below any per-account threshold by distributing and extending. The operation takes longer. The surrogate is still produced.

The Multi-Agent Multiplier

The farming infrastructure above assumes human-managed orchestration. Agentic AI removes the human from that loop. An AI agent with API access, a query generation prompt, and a collection task can execute a farming operation autonomously — adapting query strategy based on prior outputs, rotating accounts based on rate limit proximity, triggering training runs when corpus thresholds are met. The agent doesn't need to understand what a surrogate is. It needs a sufficiently detailed task specification, and it will execute indefinitely without fatigue or operational security errors.

The cost floor drops. You no longer need an ML team to operate the infrastructure — only to design the initial task. The scale ceiling rises. A single operator can manage farming operations across dozens of targets simultaneously, limited only by API budget and account provisioning capacity.

Most agentic AI governance frameworks address what agents can access inside the organization. The outbound adversarial case — agents as extraction infrastructure operating against external APIs — is not addressed. If your governance framework doesn't account for AI agents as adversarial actors against external systems, it is incomplete for the current threat environment.

— Debrief —

CISO Debrief

"You built something valuable enough that someone built infrastructure to steal it systematically. That is the situation. Now close the gap."

If your organization exposes a proprietary model via an API — for any purpose, to any client class — and you have not implemented semantic query monitoring or output marking, you have an uncharacterized extraction exposure in production. Not theoretical. Operational. Right now.

This is not a sophisticated adversary problem. The infrastructure required to run a farming operation is modular, increasingly accessible, and does not require ML expertise to operate once assembled. The barrier is finding an API with valuable outputs and no semantic detection layer. Many organizations are that API.

IR Directives

Implement semantic query monitoring. Volumetric monitoring does not detect farming. Query content distribution analysis — comparing incoming query diversity against expected application behavior — is the detection layer that matters. Define what normal query distribution looks like for your model. Build alerts for distributional anomaly. This requires investment; it does not exist in default API gateway configurations.

Audit logit-level output exposure. APIs returning probability distributions rather than hard labels are 10–30× more vulnerable to efficient extraction. Evaluate whether logit-level outputs are necessary for your clients' use cases. Where they are not, constrain response granularity. This single control meaningfully increases the query cost of surrogate training.

Deploy output watermarking. Radioactive data techniques, output perturbation schemes, and canary response mechanisms mark a model's outputs in ways that survive distillation and are detectable in a surrogate. This is an attribution control — it enables post-hoc confirmation of an extraction event and provides standing for legal action.

Enable activity threading across your industry sector. Farming operations target multiple organizations. Intelligence on shared adversary infrastructure — cloud ASNs, query formatting signatures, account provisioning patterns — is only actionable if it's shared. Engage with sector ISACs and AI security working groups now, before an event occurs.

Define extraction events in your IR playbook. Most playbooks do not include model extraction as a defined incident category. Without a definition, there is no trigger, no response team, no legal notification threshold, and no board escalation criteria. This is a documentation task. It should take days, not a project cycle.

Close the Governance Gap

Classify deployed models as protectable assets. Your data governance framework classifies records, PII, and documents. It almost certainly has no category for model behavior — the decision boundaries, calibrated reasoning, and domain-specific tuning embedded in a production model. Add one. Define ownership. Define what constitutes a breach of that asset class.

Assign a named owner for model security posture. When a model goes to production, the governance conversation should not end. Someone needs to own the ongoing security posture of that deployed model — not just the infrastructure it runs on, but what it reveals through interaction. If that role doesn't exist in your org chart, you have an accountability gap.

Update your breach definition. If your incident response and legal notification thresholds are defined around data records accessed or exfiltrated, a model extraction attack may not meet the trigger criteria — even if an attacker just replicated your most valuable proprietary system. Work with legal to establish what constitutes a reportable model IP incident before regulators define it for you.

Run a cross-functional accountability exercise. Put security, legal, and the AI product team in a room and ask: if our model were extracted through the API today, who owns the response? If there is hesitation, finger-pointing, or silence — that gap is your highest-priority governance finding.

Direct Your IR Team to

Build a model extraction incident classification. Define what constitutes an extraction event, what evidence is required to confirm it, who owns the response, and what the legal notification threshold is. A model extraction incident is not a data breach in the traditional sense — the harm is competitive and intellectual, not personal data exposure.

Develop query behavioral forensics capability. When a farming operation is suspected, you need to answer: what queries did this account submit, in what distribution, over what timeline, and how does that compare to a legitimate integration? Log the right data. Know what analysis you would run before you need to run it.

Add API credential hygiene to your model security checklist. Active monitoring for account sharing, rotation anomalies, and identity consolidation across high-volume accounts reduces attacker operational continuity. Credential hygiene slows the operation; semantic detection stops it.

Map your highest-value model APIs explicitly. Which of your APIs expose outputs from proprietary models? Which have the highest commercial or regulatory value? Which have the weakest semantic monitoring coverage? Rank them. Start closing the gap from the top.

Five Questions for Your Next Executive Meeting

1. How would we know if our proprietary model was being systematically extracted right now — today?

2. Do we have semantic monitoring on our API outputs, or only volumetric monitoring? Who owns closing that gap, and by when?

3. Are our model outputs marked in a way that would allow us to identify a surrogate in a competitor's deployment?

4. If this is an industry-wide operation targeting multiple organizations, are we in any intelligence sharing arrangement that would surface it?

5. What is the legal and regulatory exposure if a surrogate trained on our outputs is deployed in a regulated context — and who in this organization owns that answer?

Technical Reference

Threat Category: Model Theft & IP Extraction at Scale

Techniques: Model Extraction · Query Distribution Attack · Surrogate Model Training · Account Rotation & Rate Limit Evasion · Logit-Level Output Exploitation · Output Watermark Evasion

OWASP LLM Top 10: LLM10:2025 — Model Theft

OWASP LLM Top 10: LLM06:2025 — Sensitive Information Disclosure (logit-level output exposure)

MITRE ATLAS: AML.T0016 — Obtain Capabilities · AML.T0040 — ML Model Inference API Access

Key Research: Knockoff Nets — Orekondy et al. (2019) · DAWN Watermarking — Sanjabi et al. · Radioactive Data — Sablayrolles et al.

Detection Tooling: KL-divergence for query distributional anomaly · Active learning query pattern analysis · Membership inference probing

Framework: Diamond Model of Intrusion Analysis — Caltagirone, Pendergast, Betz (2013) · Activity threading for cross-organization extraction attribution

owasp.org · atlas.mitre.org · NIST AI · Diamond Model

"When AI Attacks" is a practitioner-grade security intelligence series written for CISOs, security leaders, and defenders navigating the AI threat landscape.

The scenarios described in this series are grounded in documented, publicly reported threat intelligence patterns. They do not reflect confidential information from any employer.