Curated dataset of GenAI & agentic-AI security incidents mapped to OWASP LLM Top 10, OWASP Agentic Top 10, NIST AI RMF, and MITRE ATLAS.
Project description
GenAI & Agentic AI Security Incidents
- ๐ Searchable site: https://emmanuelgjr.github.io/genai_agentic_incidents/
- ๐ฆ Python:
pip install genai-incidents - ๐ Cite: see
CITATION.cff - ๐ Changelog:
CHANGELOG.md
A single source of truth for GenAI and agentic AI security incidents, mapped to:
- OWASP Top 10 for LLM Applications (2025) โ
LLM01โLLM10 - OWASP Agentic Top 10 (ASI) โ
ASI01โASI10 - NIST AI Risk Management Framework (AI 100-1) โ
GOVERN/MAP/MEASURE/MANAGEsubcategories - MITRE ATLAS โ tactics (
AML.TA00xx) and techniques (AML.T00xx) - (Companion) MAESTRO architectural layers (
L1โL7)
The dataset is published as both a machine-readable JSON (data/incidents.json) and a human-readable Markdown index (INCIDENTS.md).
Layout
.
โโโ data/
โ โโโ incidents.json โ full single source of truth (use this)
โ โโโ incidents.min.json โ slim variant: id, title, taxonomy mappings, primary reference
โ โโโ legacy_consolidated.json โ intermediate output from the legacy parser
โโโ schema/
โ โโโ incident.schema.json โ JSON Schema for one incident
โโโ mappings/
โ โโโ owasp_llm_top10_2025.json
โ โโโ owasp_asi_top10.json
โ โโโ nist_ai_rmf.json
โ โโโ mitre_atlas.json
โ โโโ maestro_layers.json
โโโ legacy/ โ original source files (preserved verbatim)
โโโ ingest/ โ per-source aggregator outputs (CVE, AIID, ATLAS, etc.)
โโโ scripts/
โ โโโ parse_existing.py โ parse legacy/ โ data/legacy_consolidated.json
โ โโโ ingest_external.py โ parse cloned source repos under ../_external/ โ ingest/*.json
โ โโโ scrape_aiid.py โ fetch all AIID incident pages (OG metadata) โ ingest/aiid_full.json
โ โโโ ingest_airi_navigator.py โ MIT FutureTech AI Risk Navigator CSV โ ingest/airi_navigator_incidents.json
โ โโโ ingest_aiaaic_sheet.py โ AIAAIC Repository public Google Sheet โ ingest/aiaaic_sheet_incidents.json
โ โโโ ingest_oecd_aim.py โ OECD AI Incidents Monitor (10k pages) โ ingest/oecd_aim_full_incidents.json
โ โโโ ingest_cve_nvd_expanded.py โ pull AI-relevant CVEs from NVD/GHSA/OSV โ ingest/cve_nvd_expanded.json
โ โโโ merge_and_dedupe.py โ merge legacy + ingest/* โ data/incidents.json
โ โโโ render_markdown.py โ data/incidents.json โ INCIDENTS.md
โ โโโ validate.py โ validate JSON against schema
โโโ INCIDENTS.md โ rendered index: unified table, newest-first
โโโ docs/incidents/<year>.md โ per-year detail shards linked from INCIDENTS.md
โโโ tests/ โ pytest suite for merge/render helpers
โโโ LICENSE โ MIT (covers code in scripts/)
โโโ LICENSE-DATA โ CC-BY-4.0 (covers the dataset under data/)
โโโ README.md
What counts as an incident?
Anything that is one or more of:
- A real-world exploitation, breach, or misuse involving GenAI or agentic AI systems.
- A publicly disclosed vulnerability (CVE or vendor advisory) affecting an AI/ML/LLM/agent stack.
- A research-demonstrated attack with a credible PoC and public write-up.
- A red-team finding released by a security researcher with sufficient detail to reproduce or replicate.
Each entry must have at least one verifiable external URL. Entries without sources are excluded.
This repository does not include broad fairness/bias-only AI harms unless they involve a security primitive (data exfiltration, integrity attack, account compromise, etc.).
Schema (summary)
See schema/incident.schema.json for the canonical version.
{
"id": "INC-00001", // stable 5-digit ID
"source_ids": ["AIID-123", "CVE-2025-..."],
"cve_ids": ["CVE-2025-..."],
"cwe_ids": ["CWE-918"],
"cvss_score": 9.8,
"cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H",
"aiid_id": 1234, // canonical AIID numeric ID when applicable
"title": "...",
"date": "2025-09",
"disclosure_date": "2025-10-02", // separate from incident date when known
"year": 2025,
"category": "real-world | research | red-team | vulnerability-disclosure | threat-report | policy",
"description": "...",
"attack_vector": "prompt-injection | rce | supply-chain | data-exfiltration | ...",
"affected": "vendor/product",
"impact": "...",
"severity": "Critical | High | Medium | Low | Info",
"owasp_llm": ["LLM01", "LLM06"],
"owasp_asi": ["ASI01", "ASI02"],
"nist_ai_rmf": ["MEASURE-2.7", "MAP-3.5"],
"mitre_atlas": ["AML.T0051", "AML.T0051.001"],
"mitre_atlas_tactics": ["AML.TA0004"],
"maestro_layers": [{"layer":"L3","label":"Agent Frameworks & Tooling","role":"origin"}],
"mitigations": ["..."],
"references": [
{"title":"Vendor advisory","url":"https://...","type":"vendor"}
],
"tags": ["mcp","supply-chain"],
"added": "2026-05-16", // stable across re-runs
"updated": "2026-05-16" // only bumped when content actually changes
}
Using the dataset
As a Python library
pip install genai-incidents
from genai_incidents import query, by_cve, resolve_id
for inc in query(severity="Critical", attack_vector="prompt-injection", year=2026):
print(inc["id"], "-", inc["title"])
print(by_cve("CVE-2026-21520")) # all incidents that list this CVE
print(resolve_id("INC-00139")) # follow merge history to the current canonical INC
As JSON
- Full:
data/incidents.json - Slim (for UIs):
data/incidents.min.json - Schema:
schema/incident.schema.json - ID deprecations:
data/id_deprecations.jsonโ for resolving citations of merged-away IDs
As a website
Filterable, searchable, deep-linkable table at https://emmanuelgjr.github.io/genai_agentic_incidents/.
Regenerating the dataset
pip install -r requirements.txt
make build # parse legacy, merge + dedupe, render, validate
make test # pytest tests/
make ingest-all # (heavy: refresh AIID/AIRI/AIAAIC/OECD AIM/NVD from network)
Or run the steps individually:
python scripts/parse_existing.py # legacy/ -> data/legacy_consolidated.json
python scripts/merge_and_dedupe.py # legacy + ingest/* -> data/incidents.json
python scripts/render_markdown.py # data/incidents.json -> INCIDENTS.md + docs/incidents/<year>.md
python scripts/validate.py # schema check
Dedupe keys (first hit wins): (a) matching cve_ids, (b) matching source_ids (with AIID-N-OECD canonicalised to AIID-N), (c) matching normalized reference URL, (d) fuzzy title match within ยฑ1 year. After each merge the indices are reindexed so transitive dupes (entry A absorbs CVE-3, then entry B with CVE-3 already exists โ B is merged into A as well) all collapse. Merges union taxonomy mappings, references, tags, CVE/CWE IDs, and source IDs; take the highest severity; prefer the more-specific date (YYYY-MM-DD beats year-only) and reject future-year dates.
added and updated are preserved from the previous output; updated only bumps when an entry's content actually changes. That keeps make build deterministic for CI drift checks.
Adding entries
Two paths:
- Manual: append a properly-shaped object to
data/incidents.jsonand runscripts/render_markdown.py. Ensurereferenceshas at least one resolvable URL. - Automated: drop a JSON array of raw entries into
ingest/<your_source>.json(any reasonable shape โ seescripts/merge_and_dedupe.pynormalize_entryfor the field tolerance), then re-run merge + render.
Always run scripts/validate.py before committing.
Taxonomy mappings
The mapping files in mappings/ document the controlled vocabulary used in this dataset. They are derived from the original sources:
- OWASP LLM Top 10 (2025): https://genai.owasp.org/llm-top-10/
- OWASP Agentic Top 10 (ASI / "Agentic AI โ Threats and Mitigations"): https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/
- NIST AI Risk Management Framework (AI 100-1): https://www.nist.gov/itl/ai-risk-management-framework
- NIST AI 600-1 Generative AI Profile: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
- MITRE ATLAS: https://atlas.mitre.org/
- MAESTRO (companion): https://genai.owasp.org/resource/genai-security-project-maestro/
When a framework releases a new version, update the mapping JSON in mappings/ and re-run merge + validate.
Sources aggregated
The current dataset draws from the following public sources. Each entry retains links back to the originating advisory, post, or paper:
- OWASP GenAI Security Project โ incident roundups + Top 10 references
- AI Incident Database (AIID) (incidentdatabase.ai, github.com/responsible-ai-collaborative/aiid) โ security-relevant subset of the full corpus, scraped via OG metadata
- OECD AI Incidents Monitor (AIM) (oecd.ai/en/incidents) โ cross-listed against AIID via the official AIID-OECD bridge file
- AIAAIC (aiaaic.org) โ AI, Algorithmic, and Automation Incidents and Controversies
- MITRE ATLAS (atlas.mitre.org, github.com/mitre-atlas/atlas-data) โ all case studies parsed from the YAML corpus
- AVID โ AI Vulnerability Database (avidml.org)
- CSET-AIID Harm Taxonomy (github.com/georgetown-cset/CSET-AIID-harm-taxonomy) โ controlled vocabulary reference
- NVD / CVE.org / GitHub Security Advisories / OSV.dev / CISA KEV โ AI/ML/LLM/agent CVEs pulled via REST API across 56 keywords
- NVIDIA garak (github.com/NVIDIA/garak) โ one entry per LLM vulnerability scanner probe (canonical attack classes)
- promptfoo (github.com/promptfoo/promptfoo) โ one entry per red-team plugin/strategy
- ModelOriented/CVE-AI (github.com/ModelOriented/CVE-AI) โ XAI-based AI model validation findings
- Researcher and vendor blogs โ Embrace The Red, Tenable, Palo Alto Unit 42, Trail of Bits, Aim Security, Noma Security, Wiz Research, Lakera, Invariant Labs, PromptArmor, Pillar Security, Token Security, HiddenLayer, Robust Intelligence, Protect AI, Cato Networks CTRL, Endor Labs, Sysdig, Zenity Labs, JFrog, Datadog Security Labs, Reco, AppOmni, BeyondTrust, Oasis Security, Mindgard, Koi Security, Imperva, Sonar, Oligo Security, OX Security, SentinelOne, Check Point Research, Trend Micro, Tinfoil Security, ZeroPath, Cymulate, MaccariTA, and others.
- Vendor threat reports โ Anthropic, OpenAI, Google Threat Intelligence (GTIG/TAG/Mandiant), Microsoft Threat Intelligence (MTAC/MSRC), AWS Security Bulletins, CrowdStrike, Recorded Future.
- Academic papers โ selected USENIX Security / NDSS / S&P / CCS / arXiv entries with concrete adversarial PoCs.
If a source is missing or mis-attributed, open an issue or PR.
Contributing
PRs welcome. Please:
- Add at least one verifiable URL per entry.
- Map to all four taxonomies where applicable. If unsure, leave the field empty rather than guess.
- Run
scripts/validate.pyandscripts/render_markdown.pybefore opening a PR. - For incidents you authored or first reported, that's totally fine โ but please link the canonical writeup.
License
- Code (
scripts/,schema/): MIT - Data and documentation (
data/,INCIDENTS.md,mappings/): Creative Commons Attribution 4.0 International
If you use this dataset in research or tooling, please cite this repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genai_incidents-2.0.0.tar.gz.
File metadata
- Download URL: genai_incidents-2.0.0.tar.gz
- Upload date:
- Size: 908.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad145d1f509ebbfb10feca2f7c708798f705d16ca9e337e8720388edd04f2be4
|
|
| MD5 |
9966e28aa46dc9455caa53310ef67066
|
|
| BLAKE2b-256 |
4052dc4e37d1f0e0b98c2559ba8d6590269e0532df8c6d6df375ba600594997b
|
Provenance
The following attestation bundles were made for genai_incidents-2.0.0.tar.gz:
Publisher:
publish.yml on emmanuelgjr/genai_agentic_incidents
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
genai_incidents-2.0.0.tar.gz -
Subject digest:
ad145d1f509ebbfb10feca2f7c708798f705d16ca9e337e8720388edd04f2be4 - Sigstore transparency entry: 1555325534
- Sigstore integration time:
-
Permalink:
emmanuelgjr/genai_agentic_incidents@ed6a9b78afd8827e6941428c2bf165d40e8374ae -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/emmanuelgjr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ed6a9b78afd8827e6941428c2bf165d40e8374ae -
Trigger Event:
release
-
Statement type:
File details
Details for the file genai_incidents-2.0.0-py3-none-any.whl.
File metadata
- Download URL: genai_incidents-2.0.0-py3-none-any.whl
- Upload date:
- Size: 479.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52ad6bf6059249e12890402db0c900ee51239e9399b71c77419262a2afe36faf
|
|
| MD5 |
cf190965ee20bbafb6a47ce51b504ec6
|
|
| BLAKE2b-256 |
b39ddc9a25ae465d062287283336ee24ce7e27672183ba0785606cee75f1d443
|
Provenance
The following attestation bundles were made for genai_incidents-2.0.0-py3-none-any.whl:
Publisher:
publish.yml on emmanuelgjr/genai_agentic_incidents
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
genai_incidents-2.0.0-py3-none-any.whl -
Subject digest:
52ad6bf6059249e12890402db0c900ee51239e9399b71c77419262a2afe36faf - Sigstore transparency entry: 1555325543
- Sigstore integration time:
-
Permalink:
emmanuelgjr/genai_agentic_incidents@ed6a9b78afd8827e6941428c2bf165d40e8374ae -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/emmanuelgjr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ed6a9b78afd8827e6941428c2bf165d40e8374ae -
Trigger Event:
release
-
Statement type: