Skip to main content

Curated dataset of GenAI & agentic-AI security incidents mapped to OWASP LLM Top 10, OWASP Agentic Top 10, NIST AI RMF, and MITRE ATLAS.

Project description

GenAI & Agentic AI Security Incidents

Validate dataset PyPI License: MIT (code) License: CC-BY-4.0 (data)

A single source of truth for GenAI and agentic AI security incidents, mapped to:

  • OWASP Top 10 for LLM Applications (2025) โ€” LLM01โ€“LLM10
  • OWASP Agentic Top 10 (ASI) โ€” ASI01โ€“ASI10
  • NIST AI Risk Management Framework (AI 100-1) โ€” GOVERN / MAP / MEASURE / MANAGE subcategories
  • MITRE ATLAS โ€” tactics (AML.TA00xx) and techniques (AML.T00xx)
  • (Companion) MAESTRO architectural layers (L1โ€“L7)

The dataset is published as both a machine-readable JSON (data/incidents.json) and a human-readable Markdown index (INCIDENTS.md).


Layout

.
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ incidents.json          โ† full single source of truth (use this)
โ”‚   โ”œโ”€โ”€ incidents.min.json      โ† slim variant: id, title, taxonomy mappings, primary reference
โ”‚   โ””โ”€โ”€ legacy_consolidated.json โ† intermediate output from the legacy parser
โ”œโ”€โ”€ schema/
โ”‚   โ””โ”€โ”€ incident.schema.json    โ† JSON Schema for one incident
โ”œโ”€โ”€ mappings/
โ”‚   โ”œโ”€โ”€ owasp_llm_top10_2025.json
โ”‚   โ”œโ”€โ”€ owasp_asi_top10.json
โ”‚   โ”œโ”€โ”€ nist_ai_rmf.json
โ”‚   โ”œโ”€โ”€ mitre_atlas.json
โ”‚   โ””โ”€โ”€ maestro_layers.json
โ”œโ”€โ”€ legacy/                     โ† original source files (preserved verbatim)
โ”œโ”€โ”€ ingest/                     โ† per-source aggregator outputs (CVE, AIID, ATLAS, etc.)
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ parse_existing.py             โ† parse legacy/ โ†’ data/legacy_consolidated.json
โ”‚   โ”œโ”€โ”€ ingest_external.py            โ† parse cloned source repos under ../_external/ โ†’ ingest/*.json
โ”‚   โ”œโ”€โ”€ scrape_aiid.py                โ† fetch all AIID incident pages (OG metadata) โ†’ ingest/aiid_full.json
โ”‚   โ”œโ”€โ”€ ingest_airi_navigator.py      โ† MIT FutureTech AI Risk Navigator CSV โ†’ ingest/airi_navigator_incidents.json
โ”‚   โ”œโ”€โ”€ ingest_aiaaic_sheet.py        โ† AIAAIC Repository public Google Sheet โ†’ ingest/aiaaic_sheet_incidents.json
โ”‚   โ”œโ”€โ”€ ingest_oecd_aim.py            โ† OECD AI Incidents Monitor (10k pages) โ†’ ingest/oecd_aim_full_incidents.json
โ”‚   โ”œโ”€โ”€ ingest_cve_nvd_expanded.py    โ† pull AI-relevant CVEs from NVD/GHSA/OSV โ†’ ingest/cve_nvd_expanded.json
โ”‚   โ”œโ”€โ”€ merge_and_dedupe.py           โ† merge legacy + ingest/* โ†’ data/incidents.json
โ”‚   โ”œโ”€โ”€ render_markdown.py            โ† data/incidents.json โ†’ INCIDENTS.md
โ”‚   โ””โ”€โ”€ validate.py                   โ† validate JSON against schema
โ”œโ”€โ”€ INCIDENTS.md                โ† rendered index: unified table, newest-first
โ”œโ”€โ”€ docs/incidents/<year>.md    โ† per-year detail shards linked from INCIDENTS.md
โ”œโ”€โ”€ tests/                      โ† pytest suite for merge/render helpers
โ”œโ”€โ”€ LICENSE                     โ† MIT (covers code in scripts/)
โ”œโ”€โ”€ LICENSE-DATA                โ† CC-BY-4.0 (covers the dataset under data/)
โ””โ”€โ”€ README.md

What counts as an incident?

Anything that is one or more of:

  1. A real-world exploitation, breach, or misuse involving GenAI or agentic AI systems.
  2. A publicly disclosed vulnerability (CVE or vendor advisory) affecting an AI/ML/LLM/agent stack.
  3. A research-demonstrated attack with a credible PoC and public write-up.
  4. A red-team finding released by a security researcher with sufficient detail to reproduce or replicate.

Each entry must have at least one verifiable external URL. Entries without sources are excluded.

This repository does not include broad fairness/bias-only AI harms unless they involve a security primitive (data exfiltration, integrity attack, account compromise, etc.).


Schema (summary)

See schema/incident.schema.json for the canonical version.

{
  "id": "INC-00001",                 // stable 5-digit ID
  "source_ids": ["AIID-123", "CVE-2025-..."],
  "cve_ids": ["CVE-2025-..."],
  "cwe_ids": ["CWE-918"],
  "cvss_score": 9.8,
  "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H",
  "aiid_id": 1234,                   // canonical AIID numeric ID when applicable
  "title": "...",
  "date": "2025-09",
  "disclosure_date": "2025-10-02",   // separate from incident date when known
  "year": 2025,
  "category": "real-world | research | red-team | vulnerability-disclosure | threat-report | policy",
  "description": "...",
  "attack_vector": "prompt-injection | rce | supply-chain | data-exfiltration | ...",
  "affected": "vendor/product",
  "impact": "...",
  "severity": "Critical | High | Medium | Low | Info",
  "owasp_llm": ["LLM01", "LLM06"],
  "owasp_asi": ["ASI01", "ASI02"],
  "nist_ai_rmf": ["MEASURE-2.7", "MAP-3.5"],
  "mitre_atlas": ["AML.T0051", "AML.T0051.001"],
  "mitre_atlas_tactics": ["AML.TA0004"],
  "maestro_layers": [{"layer":"L3","label":"Agent Frameworks & Tooling","role":"origin"}],
  "mitigations": ["..."],
  "references": [
    {"title":"Vendor advisory","url":"https://...","type":"vendor"}
  ],
  "tags": ["mcp","supply-chain"],
  "added": "2026-05-16",             // stable across re-runs
  "updated": "2026-05-16"            // only bumped when content actually changes
}

Using the dataset

As a Python library

pip install genai-incidents
from genai_incidents import query, by_cve, resolve_id

for inc in query(severity="Critical", attack_vector="prompt-injection", year=2026):
    print(inc["id"], "-", inc["title"])

print(by_cve("CVE-2026-21520"))   # all incidents that list this CVE
print(resolve_id("INC-00139"))    # follow merge history to the current canonical INC

As JSON

As a website

Filterable, searchable, deep-linkable table at https://emmanuelgjr.github.io/genai_agentic_incidents/.

Regenerating the dataset

pip install -r requirements.txt
make build      # parse legacy, merge + dedupe, render, validate
make test       # pytest tests/
make ingest-all # (heavy: refresh AIID/AIRI/AIAAIC/OECD AIM/NVD from network)

Or run the steps individually:

python scripts/parse_existing.py     # legacy/ -> data/legacy_consolidated.json
python scripts/merge_and_dedupe.py   # legacy + ingest/* -> data/incidents.json
python scripts/render_markdown.py    # data/incidents.json -> INCIDENTS.md + docs/incidents/<year>.md
python scripts/validate.py           # schema check

Dedupe keys (first hit wins): (a) matching cve_ids, (b) matching source_ids (with AIID-N-OECD canonicalised to AIID-N), (c) matching normalized reference URL, (d) fuzzy title match within ยฑ1 year. After each merge the indices are reindexed so transitive dupes (entry A absorbs CVE-3, then entry B with CVE-3 already exists โ†’ B is merged into A as well) all collapse. Merges union taxonomy mappings, references, tags, CVE/CWE IDs, and source IDs; take the highest severity; prefer the more-specific date (YYYY-MM-DD beats year-only) and reject future-year dates.

added and updated are preserved from the previous output; updated only bumps when an entry's content actually changes. That keeps make build deterministic for CI drift checks.


Adding entries

Two paths:

  1. Manual: append a properly-shaped object to data/incidents.json and run scripts/render_markdown.py. Ensure references has at least one resolvable URL.
  2. Automated: drop a JSON array of raw entries into ingest/<your_source>.json (any reasonable shape โ€” see scripts/merge_and_dedupe.py normalize_entry for the field tolerance), then re-run merge + render.

Always run scripts/validate.py before committing.


Taxonomy mappings

The mapping files in mappings/ document the controlled vocabulary used in this dataset. They are derived from the original sources:

When a framework releases a new version, update the mapping JSON in mappings/ and re-run merge + validate.


Sources aggregated

The current dataset draws from the following public sources. Each entry retains links back to the originating advisory, post, or paper:

  • OWASP GenAI Security Project โ€” incident roundups + Top 10 references
  • AI Incident Database (AIID) (incidentdatabase.ai, github.com/responsible-ai-collaborative/aiid) โ€” security-relevant subset of the full corpus, scraped via OG metadata
  • OECD AI Incidents Monitor (AIM) (oecd.ai/en/incidents) โ€” cross-listed against AIID via the official AIID-OECD bridge file
  • AIAAIC (aiaaic.org) โ€” AI, Algorithmic, and Automation Incidents and Controversies
  • MITRE ATLAS (atlas.mitre.org, github.com/mitre-atlas/atlas-data) โ€” all case studies parsed from the YAML corpus
  • AVID โ€” AI Vulnerability Database (avidml.org)
  • CSET-AIID Harm Taxonomy (github.com/georgetown-cset/CSET-AIID-harm-taxonomy) โ€” controlled vocabulary reference
  • NVD / CVE.org / GitHub Security Advisories / OSV.dev / CISA KEV โ€” AI/ML/LLM/agent CVEs pulled via REST API across 56 keywords
  • NVIDIA garak (github.com/NVIDIA/garak) โ€” one entry per LLM vulnerability scanner probe (canonical attack classes)
  • promptfoo (github.com/promptfoo/promptfoo) โ€” one entry per red-team plugin/strategy
  • ModelOriented/CVE-AI (github.com/ModelOriented/CVE-AI) โ€” XAI-based AI model validation findings
  • Researcher and vendor blogs โ€” Embrace The Red, Tenable, Palo Alto Unit 42, Trail of Bits, Aim Security, Noma Security, Wiz Research, Lakera, Invariant Labs, PromptArmor, Pillar Security, Token Security, HiddenLayer, Robust Intelligence, Protect AI, Cato Networks CTRL, Endor Labs, Sysdig, Zenity Labs, JFrog, Datadog Security Labs, Reco, AppOmni, BeyondTrust, Oasis Security, Mindgard, Koi Security, Imperva, Sonar, Oligo Security, OX Security, SentinelOne, Check Point Research, Trend Micro, Tinfoil Security, ZeroPath, Cymulate, MaccariTA, and others.
  • Vendor threat reports โ€” Anthropic, OpenAI, Google Threat Intelligence (GTIG/TAG/Mandiant), Microsoft Threat Intelligence (MTAC/MSRC), AWS Security Bulletins, CrowdStrike, Recorded Future.
  • Academic papers โ€” selected USENIX Security / NDSS / S&P / CCS / arXiv entries with concrete adversarial PoCs.

If a source is missing or mis-attributed, open an issue or PR.


Contributing

PRs welcome. Please:

  • Add at least one verifiable URL per entry.
  • Map to all four taxonomies where applicable. If unsure, leave the field empty rather than guess.
  • Run scripts/validate.py and scripts/render_markdown.py before opening a PR.
  • For incidents you authored or first reported, that's totally fine โ€” but please link the canonical writeup.

License

If you use this dataset in research or tooling, please cite this repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genai_incidents-2.0.0.tar.gz (908.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genai_incidents-2.0.0-py3-none-any.whl (479.4 kB view details)

Uploaded Python 3

File details

Details for the file genai_incidents-2.0.0.tar.gz.

File metadata

  • Download URL: genai_incidents-2.0.0.tar.gz
  • Upload date:
  • Size: 908.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genai_incidents-2.0.0.tar.gz
Algorithm Hash digest
SHA256 ad145d1f509ebbfb10feca2f7c708798f705d16ca9e337e8720388edd04f2be4
MD5 9966e28aa46dc9455caa53310ef67066
BLAKE2b-256 4052dc4e37d1f0e0b98c2559ba8d6590269e0532df8c6d6df375ba600594997b

See more details on using hashes here.

Provenance

The following attestation bundles were made for genai_incidents-2.0.0.tar.gz:

Publisher: publish.yml on emmanuelgjr/genai_agentic_incidents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file genai_incidents-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: genai_incidents-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 479.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for genai_incidents-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52ad6bf6059249e12890402db0c900ee51239e9399b71c77419262a2afe36faf
MD5 cf190965ee20bbafb6a47ce51b504ec6
BLAKE2b-256 b39ddc9a25ae465d062287283336ee24ce7e27672183ba0785606cee75f1d443

See more details on using hashes here.

Provenance

The following attestation bundles were made for genai_incidents-2.0.0-py3-none-any.whl:

Publisher: publish.yml on emmanuelgjr/genai_agentic_incidents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page