Data Poisoning

What is Data Poisoning?

Data poisoning is an attack on machine learning systems in which an adversary manipulates the training data used to build or fine-tune an AI model, causing the model to learn incorrect patterns, exhibit biased behavior, or execute specific malicious actions when triggered. Data poisoning is classified as a training-time attack — distinct from inference-time attacks like prompt injection or LLM jailbreak — and represents a supply chain attack against the AI development pipeline itself. As organizations fine-tune foundation models on proprietary data and ingest external data sources into Retrieval-Augmented Generation (RAG) pipelines, the data poisoning attack surface has expanded significantly.

Description

Data poisoning attacks fall into two primary categories. Availability attacks degrade overall model performance by poisoning enough training data that the model cannot learn correct patterns — analogous to corrupting a database. Integrity attacks, more sophisticated and targeted, introduce specific malicious behaviors: a backdoor attack plants a hidden trigger in the training data such that the model behaves normally in all cases except when it encounters the trigger input, at which point it executes attacker-defined behavior. For example, a maliciously fine-tuned code generation model might produce functionally correct code in all cases — except when asked to implement authentication, where it silently inserts a backdoor. RAG poisoning is an emerging variant relevant to agentic AI and LLM deployments: malicious content injected into the knowledge base or document store queried by the AI contaminates the model's responses for users who retrieve that content. A documented case in 2024 demonstrated successful RAG poisoning of ChatGPT's browsing capabilities through poisoned web content — affecting all users who asked the model to retrieve information from the compromised source. Data poisoning risk compounds when organizations use third-party datasets, open-source model weights, or community-contributed fine-tuning data without integrity verification — connecting directly to software bill of materials practices for AI components.

Usage and Examples

An enterprise deploys a custom LLM trained on internal documentation to assist developers. A malicious insider contributes poisoned documentation during the training data collection phase, embedding subtle backdoors that cause the model to recommend insecure coding patterns under specific conditions — such as when asked to implement input validation for financial transaction processing. The model passes all standard quality evaluations because the poisoned behavior only activates on the specific trigger condition. Developers trust the model's output and ship vulnerable code to production. Detecting data poisoning requires integrity controls at the data collection stage (access logging, data provenance tracking), model behavior testing against adversarial probes during the evaluation phase, and runtime monitoring of model outputs for statistical anomalies. AI Bills of Materials (AI BOMs) that document training data sources, versions, and provenance are an important transparency mechanism for managing this risk.

How Does This Relate to Penetration Testing?

Data poisoning evaluation is a component of advanced AI penetration testing engagements, particularly for organizations that train or fine-tune their own models on internal data. Assessment focuses on the security of the training data pipeline — who can contribute training data, whether data provenance is tracked, whether model behavior is tested against backdoor trigger probes before deployment, and whether RAG knowledge bases have access controls and integrity monitoring. For organizations using third-party fine-tuned models, supply chain risk assessment evaluates the trustworthiness of model sources and the security practices of model providers. Evolve Security's AI Penetration Testing assessments evaluate the security of your AI training pipelines and RAG knowledge bases — identifying data poisoning risks before they compromise model behavior in production.

Previous term
No previous terms!
Next term
No next terms!