Data Poisoning

What is Data Poisoning?

Data poisoning is an attack on machine learning systems in which an adversary manipulates the training data used to build or fine-tune an AI model, causing the model to learn incorrect patterns, exhibit biased behavior, or execute specific malicious actions when triggered. Data poisoning is classified as a training-time attack — distinct from inference-time attacks like prompt injection or LLM jailbreak — and represents a supply chain attack against the AI development pipeline itself. As organizations fine-tune foundation models on proprietary data and ingest external data sources into Retrieval-Augmented Generation (RAG) pipelines, the data poisoning attack surface has expanded significantly.

Description

Data poisoning attacks fall into two primary categories. Availability attacks degrade overall model performance by poisoning enough training data that the model cannot learn correct patterns — analogous to corrupting a database. Integrity attacks, more sophisticated and targeted, introduce specific malicious behaviors: a backdoor attack plants a hidden trigger in the training data such that the model behaves normally in all cases except when it encounters the trigger input, at which point it executes attacker-defined behavior. For example, a maliciously fine-tuned code generation model might produce functionally correct code in all cases — except when asked to implement authentication, where it silently inserts a backdoor. RAG poisoning is an emerging variant relevant to agentic AI and LLM deployments: malicious content injected into the knowledge base or document store queried by the AI contaminates the model's responses for users who retrieve that content. A documented case in 2024 demonstrated successful RAG poisoning of ChatGPT's browsing capabilities through poisoned web content — affecting all users who asked the model to retrieve information from the compromised source. Data poisoning risk compounds when organizations use third-party datasets, open-source model weights, or community-contributed fine-tuning data without integrity verification — connecting directly to software bill of materials practices for AI components.

Usage and Examples

An enterprise deploys a custom LLM trained on internal documentation to assist developers. A malicious insider contributes poisoned documentation during the training data collection phase, embedding subtle backdoors that cause the model to recommend insecure coding patterns under specific conditions — such as when asked to implement input validation for financial transaction processing. The model passes all standard quality evaluations because the poisoned behavior only activates on the specific trigger condition. Developers trust the model's output and ship vulnerable code to production. Detecting data poisoning requires integrity controls at the data collection stage (access logging, data provenance tracking), model behavior testing against adversarial probes during the evaluation phase, and runtime monitoring of model outputs for statistical anomalies. AI Bills of Materials (AI BOMs) that document training data sources, versions, and provenance are an important transparency mechanism for managing this risk.

How Does This Relate to Penetration Testing?

Data poisoning evaluation is a component of advanced AI penetration testing engagements, particularly for organizations that train or fine-tune their own models on internal data. Assessment focuses on the security of the training data pipeline — who can contribute training data, whether data provenance is tracked, whether model behavior is tested against backdoor trigger probes before deployment, and whether RAG knowledge bases have access controls and integrity monitoring. For organizations using third-party fine-tuned models, supply chain risk assessment evaluates the trustworthiness of model sources and the security practices of model providers. Evolve Security's AI Penetration Testing assessments evaluate the security of your AI training pipelines and RAG knowledge bases — identifying data poisoning risks before they compromise model behavior in production.

Previous term

No previous terms!

Next term

No next terms!

Stay in the know. Subscribe today!

Data Poisoning

What is Data Poisoning?

Description

Usage and Examples

How Does This Relate to Penetration Testing?

Access control

Advanced Persistent Threat

Adversarial Machine Learning

Adversary-in-the-Middle (AiTM) Attack

Agentic AI Security

AI-Powered Social Engineering

AI Red Teaming

AI Security

Anthropic Fable (Claude Fable 5)

Anthropic Mythos (Claude Mythos Preview)

API Security

Application Penetration Testing

Assumed Breach

Attack Surface

Attack Surface Management (ASM)

Botnet

Broken Access Control

Business Email Compromise (BEC)

BYOD

CIS Controls

CIS RAM

Cloud computing

Cloud Security

Cloud Security Posture Management (CSPM)

COBIT

Command and Control (C2)

Container Escape

Continuous Threat Exposure Management (CTEM)

Credential Stuffing

Cryptocurrency

Cryptojacking

Cyber Attack

Cyber Maturity Model Certification (CMMC)

Cyber Resilience

Cyber Threat Intelligence

Darknet

Data Breach

Data Loss Prevention

Data Poisoning

DDoS Attack

Declaration of Conformity

Deepfake

Detection Engineering

DMZ

Encryption

Endpoint

Endpoint Detection and Response

Ethical Hacking Tools

Exposure Management

Firewall

Firmware Security

FISMA

Gap analysis

GDPR

Hacker

HIPAA

Hypervisor (VMM)

Identification

Identity Theft

Identity Threat Detection and Response (ITDR)

Incident Response

Infrastructure-as-a-Service (IaaS)

Initial Access Brokers

Insider Threat

Internal Penetration Testing

Intrusion detection system (IDS)

Intrusion Prevention System (IPS)

ISO 27001

Keyboard logger

Lateral Movement

LLM Jailbreak

Macro virus

Malicious Apps

Malware

Managed Detection and Response (MDR)