Curated dataset of GenAI & agentic-AI security incidents mapped to OWASP LLM Top 10, OWASP Agentic Top 10, NIST AI RMF, and MITRE ATLAS.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

emmanuelgjr

These details have not been verified by PyPI

Project description

GenAI & Agentic AI Security Incidents

🔎 Searchable site: https://emmanuelgjr.github.io/genai_incidents/
📦 Python: pip install genai-incidents
🤗 Hugging Face: emmanuelgjr/genai-incidents — load_dataset("emmanuelgjr/genai-incidents") (built with make huggingface)
🛰️ STIX 2.1 bundle (for OpenCTI / MISP / TAXII): https://emmanuelgjr.github.io/genai_incidents/data/incidents.stix.json — incidents as x-genai-incident SDOs linked to MITRE ATLAS attack-patterns and CVE vulnerabilitys. Build locally with make stix.
📡 TAXII 2.1 (static): discovery at https://emmanuelgjr.github.io/genai_incidents/taxii2/discovery.json — a read-only static mirror of the STIX collection (usage + caveats). Build locally with make taxii.
🛡️ MISP feed: subscribe a MISP instance to https://emmanuelgjr.github.io/genai_incidents/misp/ (Format: MISP Feed) — incidents grouped into year-events with genai-incidents:* / mitre-atlas:* tags. Build locally with make misp.
📖 Field reference: docs/DATA_DICTIONARY.md · Provenance & limitations: docs/DATASHEET.md
🎯 Scope — what's in/out: INCLUSION.md (the inclusion policy every entry must satisfy)
🛠️ Spot an error? Open a data correction or scope dispute — accepted changes are logged in CORRECTIONS.md
🪪 DOI: 10.5281/zenodo.20248676 — see CITATION.cff
📄 Methodology: docs/paper/genai-incidents-methods.md — how the dataset is built, mapped, and governed
📜 Changelog: CHANGELOG.md

A single source of truth of 12,500+ GenAI and agentic AI security incidents (see the live count in the badge above), each mapped to:

OWASP Top 10 for LLM Applications (2025) — LLM01–LLM10
OWASP Agentic Top 10 (ASI) — ASI01–ASI10
NIST AI Risk Management Framework (AI 100-1) — GOVERN / MAP / MEASURE / MANAGE subcategories
MITRE ATLAS — tactics (AML.TA00xx) and techniques (AML.T00xx)
(Companion) MAESTRO architectural layers (L1–L7)

The dataset is published as both a machine-readable JSON (data/incidents.json) and a human-readable Markdown index (INCIDENTS.md).

Layout

.
├── data/
│   ├── incidents.json          ← full single source of truth (use this)
│   ├── incidents.min.json      ← slim variant: id, title, taxonomy mappings, primary reference
│   └── legacy_consolidated.json ← intermediate output from the legacy parser
├── schema/
│   └── incident.schema.json    ← JSON Schema for one incident
├── mappings/
│   ├── owasp_llm_top10_2025.json
│   ├── owasp_asi_top10.json
│   ├── nist_ai_rmf.json
│   ├── mitre_atlas.json
│   └── maestro_layers.json
├── legacy/                     ← original source files (preserved verbatim)
├── ingest/                     ← per-source aggregator outputs (CVE, AIID, ATLAS, etc.)
├── scripts/
│   ├── parse_existing.py             ← parse legacy/ → data/legacy_consolidated.json
│   ├── ingest_external.py            ← parse cloned source repos under ../_external/ → ingest/*.json
│   ├── scrape_aiid.py                ← fetch all AIID incident pages (OG metadata) → ingest/aiid_full.json
│   ├── ingest_airi_navigator.py      ← MIT FutureTech AI Risk Navigator CSV → ingest/airi_navigator_incidents.json
│   ├── ingest_aiaaic_sheet.py        ← AIAAIC Repository public Google Sheet → ingest/aiaaic_sheet_incidents.json
│   ├── ingest_oecd_aim.py            ← OECD AI Incidents Monitor (10k pages) → ingest/oecd_aim_full_incidents.json
│   ├── ingest_cve_nvd_expanded.py    ← pull AI-relevant CVEs from NVD/GHSA/OSV → ingest/cve_nvd_expanded.json
│   ├── merge_and_dedupe.py           ← merge legacy + ingest/* → data/incidents.json
│   ├── render_markdown.py            ← data/incidents.json → INCIDENTS.md
│   └── validate.py                   ← validate JSON against schema
├── INCIDENTS.md                ← rendered index: unified table, newest-first
├── docs/incidents/<year>.md    ← per-year detail shards linked from INCIDENTS.md
├── tests/                      ← pytest suite for merge/render helpers
├── LICENSE                     ← MIT (covers code in scripts/)
├── LICENSE-DATA                ← CC-BY-4.0 (covers the dataset under data/)
└── README.md

What counts as an incident?

Anything that is one or more of:

A real-world exploitation, breach, or misuse involving GenAI or agentic AI systems.
A publicly disclosed vulnerability (CVE or vendor advisory) affecting an AI/ML/LLM/agent stack.
A research-demonstrated attack with a credible PoC and public write-up.
A red-team finding released by a security researcher with sufficient detail to reproduce or replicate.

Each entry must have at least one verifiable external URL. Entries without sources are excluded.

This repository does not include broad fairness/bias-only AI harms unless they involve a security primitive (data exfiltration, integrity attack, account compromise, etc.).

Schema (summary)

See schema/incident.schema.json for the canonical version.

{
  "id": "INC-00001",                 // stable 5-digit ID
  "source_ids": ["AIID-123", "CVE-2025-..."],
  "cve_ids": ["CVE-2025-..."],
  "cwe_ids": ["CWE-918"],
  "cvss_score": 9.8,
  "cvss_vector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H",
  "aiid_id": 1234,                   // canonical AIID numeric ID when applicable
  "title": "...",
  "date": "2025-09",
  "disclosure_date": "2025-10-02",   // separate from incident date when known
  "year": 2025,
  "category": "real-world | research | red-team | vulnerability-disclosure | threat-report | policy",
  "description": "...",
  "attack_vector": "prompt-injection | rce | supply-chain | data-exfiltration | ...",
  "affected": "vendor/product",
  "impact": "...",
  "severity": "Critical | High | Medium | Low | Info",
  "owasp_llm": ["LLM01", "LLM06"],
  "owasp_asi": ["ASI01", "ASI02"],
  "nist_ai_rmf": ["MEASURE-2.7", "MAP-3.5"],
  "mitre_atlas": ["AML.T0051", "AML.T0051.001"],
  "mitre_atlas_tactics": ["AML.TA0004"],
  "maestro_layers": [{"layer":"L3","label":"Agent Frameworks & Tooling","role":"origin"}],
  "mitigations": ["..."],
  "references": [
    {"title":"Vendor advisory","url":"https://...","type":"vendor"}
  ],
  "tags": ["mcp","supply-chain"],
  "added": "2026-05-16",             // stable across re-runs
  "updated": "2026-05-16"            // only bumped when content actually changes
}

Using the dataset

As a Python library

pip install genai-incidents

from genai_incidents import query, by_cve, resolve_id

for inc in query(severity="Critical", attack_vector="prompt-injection", year=2026):
    print(inc["id"], "-", inc["title"])

print(by_cve("CVE-2026-21520"))   # all incidents that list this CVE
print(resolve_id("INC-00139"))    # follow merge history to the current canonical INC

As JSON

Full: data/incidents.json
Slim (for UIs): data/incidents.min.json
Schema: schema/incident.schema.json
ID deprecations: data/id_deprecations.json — for resolving citations of merged-away IDs

As a website

Filterable, searchable, deep-linkable table at https://emmanuelgjr.github.io/genai_incidents/.

Regenerating the dataset

pip install -r requirements.txt
make build      # parse legacy, merge + dedupe, render, validate
make test       # pytest tests/
make ingest-all # (heavy: refresh AIID/AIRI/AIAAIC/OECD AIM/NVD from network)

Or run the steps individually:

python scripts/parse_existing.py     # legacy/ -> data/legacy_consolidated.json
python scripts/merge_and_dedupe.py   # legacy + ingest/* -> data/incidents.json
python scripts/render_markdown.py    # data/incidents.json -> INCIDENTS.md + docs/incidents/<year>.md
python scripts/validate.py           # schema check

Dedupe keys (first hit wins): (a) matching cve_ids, (b) matching source_ids (with AIID-N-OECD canonicalised to AIID-N), (c) matching normalized reference URL, (d) fuzzy title match within ±1 year. After each merge the indices are reindexed so transitive dupes (entry A absorbs CVE-3, then entry B with CVE-3 already exists → B is merged into A as well) all collapse. Merges union taxonomy mappings, references, tags, CVE/CWE IDs, and source IDs; take the highest severity; prefer the more-specific date (YYYY-MM-DD beats year-only) and reject future-year dates.

added and updated are preserved from the previous output; updated only bumps when an entry's content actually changes. That keeps make build deterministic for CI drift checks.

Adding entries

Two paths:

Manual: append a properly-shaped object to data/incidents.json and run scripts/render_markdown.py. Ensure references has at least one resolvable URL.
Automated: drop a JSON array of raw entries into ingest/<your_source>.json (any reasonable shape — see scripts/merge_and_dedupe.py normalize_entry for the field tolerance), then re-run merge + render.

Always run scripts/validate.py before committing.

Taxonomy mappings

The mapping files in mappings/ document the controlled vocabulary used in this dataset. They are derived from the original sources:

OWASP LLM Top 10 (2025): https://genai.owasp.org/llm-top-10/
OWASP Agentic Top 10 (ASI / "Agentic AI – Threats and Mitigations"): https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/
NIST AI Risk Management Framework (AI 100-1): https://www.nist.gov/itl/ai-risk-management-framework
NIST AI 600-1 Generative AI Profile: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
MITRE ATLAS: https://atlas.mitre.org/
MAESTRO (companion): https://genai.owasp.org/resource/genai-security-project-maestro/

When a framework releases a new version, update the mapping JSON in mappings/ and re-run merge + validate.

Sources aggregated

The current dataset draws from the following public sources. Each entry retains links back to the originating advisory, post, or paper:

OWASP GenAI Security Project — incident roundups + Top 10 references
AI Incident Database (AIID) (incidentdatabase.ai, github.com/responsible-ai-collaborative/aiid) — security-relevant subset of the full corpus, scraped via OG metadata
OECD AI Incidents Monitor (AIM) (oecd.ai/en/incidents) — cross-listed against AIID via the official AIID-OECD bridge file
AIAAIC (aiaaic.org) — AI, Algorithmic, and Automation Incidents and Controversies
MITRE ATLAS (atlas.mitre.org, github.com/mitre-atlas/atlas-data) — all case studies parsed from the YAML corpus
AVID — AI Vulnerability Database (avidml.org)
CSET-AIID Harm Taxonomy (github.com/georgetown-cset/CSET-AIID-harm-taxonomy) — controlled vocabulary reference
NVD / CVE.org / GitHub Security Advisories / OSV.dev / CISA KEV — AI/ML/LLM/agent CVEs pulled via REST API across 56 keywords
NVIDIA garak (github.com/NVIDIA/garak) — one entry per LLM vulnerability scanner probe (canonical attack classes)
promptfoo (github.com/promptfoo/promptfoo) — one entry per red-team plugin/strategy
ModelOriented/CVE-AI (github.com/ModelOriented/CVE-AI) — XAI-based AI model validation findings
Researcher and vendor blogs — Embrace The Red, Tenable, Palo Alto Unit 42, Trail of Bits, Aim Security, Noma Security, Wiz Research, Lakera, Invariant Labs, PromptArmor, Pillar Security, Token Security, HiddenLayer, Robust Intelligence, Protect AI, Cato Networks CTRL, Endor Labs, Sysdig, Zenity Labs, JFrog, Datadog Security Labs, Reco, AppOmni, BeyondTrust, Oasis Security, Mindgard, Koi Security, Imperva, Sonar, Oligo Security, OX Security, SentinelOne, Check Point Research, Trend Micro, Tinfoil Security, ZeroPath, Cymulate, MaccariTA, and others.
Vendor threat reports — Anthropic, OpenAI, Google Threat Intelligence (GTIG/TAG/Mandiant), Microsoft Threat Intelligence (MTAC/MSRC), AWS Security Bulletins, CrowdStrike, Recorded Future.
Academic papers — selected USENIX Security / NDSS / S&P / CCS / arXiv entries with concrete adversarial PoCs.

If a source is missing or mis-attributed, open an issue or PR.

Contributing

PRs welcome. Please:

Add at least one verifiable URL per entry.
Map to all four taxonomies where applicable. If unsure, leave the field empty rather than guess.
Run scripts/validate.py and scripts/render_markdown.py before opening a PR.
For incidents you authored or first reported, that's totally fine — but please link the canonical writeup.

License

Code (scripts/, schema/): MIT
Data and documentation (data/, INCIDENTS.md, mappings/): Creative Commons Attribution 4.0 International

If you use this dataset in research or tooling, please cite this repository.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

emmanuelgjr

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.6.0

Jul 2, 2026

2.5.0

Jun 11, 2026

2.4.0

Jun 11, 2026

2.3.1

Jun 11, 2026

2.3.0

Jun 10, 2026

2.2.0

Jun 10, 2026

2.1.0

Jun 3, 2026

2.0.0

May 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genai_incidents-2.6.0.tar.gz (4.1 MB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

genai_incidents-2.6.0-py3-none-any.whl (2.1 MB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file genai_incidents-2.6.0.tar.gz.

File metadata

Download URL: genai_incidents-2.6.0.tar.gz
Upload date: Jul 2, 2026
Size: 4.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for genai_incidents-2.6.0.tar.gz
Algorithm	Hash digest
SHA256	`7e29f6c557f593ef33eaefbf5fe1f08d3c99ec104ea81d99526809031155f769`
MD5	`826d83426f1dc9a42cc5487a05bd14ba`
BLAKE2b-256	`46ddbba464c760382450f513bb0ed9d40b3d1b7a17b7a33506ff1dc10ceba6a4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for genai_incidents-2.6.0.tar.gz:

Publisher: publish.yml on emmanuelgjr/genai_incidents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: genai_incidents-2.6.0.tar.gz
- Subject digest: 7e29f6c557f593ef33eaefbf5fe1f08d3c99ec104ea81d99526809031155f769
- Sigstore transparency entry: 2047536374
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: emmanuelgjr/genai_incidents@a1ec3c2f224c52235522bedb2701ce3d76060294
- Branch / Tag: refs/tags/v2.6.0
- Owner: https://github.com/emmanuelgjr
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a1ec3c2f224c52235522bedb2701ce3d76060294
- Trigger Event: release

File details

Details for the file genai_incidents-2.6.0-py3-none-any.whl.

File metadata

Download URL: genai_incidents-2.6.0-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 2.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for genai_incidents-2.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`38232612ddb5411775fb0327325bfea1bf3c352d2ac05cc338a6863c3c12ac48`
MD5	`45688ec290371e0deb6a69aafb54a8a6`
BLAKE2b-256	`00b8a1ccdaa89afca1dab558e070f50e6887706e9bc6951929ccca235c912df4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for genai_incidents-2.6.0-py3-none-any.whl:

Publisher: publish.yml on emmanuelgjr/genai_incidents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: genai_incidents-2.6.0-py3-none-any.whl
- Subject digest: 38232612ddb5411775fb0327325bfea1bf3c352d2ac05cc338a6863c3c12ac48
- Sigstore transparency entry: 2047536384
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: emmanuelgjr/genai_incidents@a1ec3c2f224c52235522bedb2701ce3d76060294
- Branch / Tag: refs/tags/v2.6.0
- Owner: https://github.com/emmanuelgjr
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a1ec3c2f224c52235522bedb2701ce3d76060294
- Trigger Event: release

genai-incidents 2.6.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

GenAI & Agentic AI Security Incidents

Layout

What counts as an incident?

Schema (summary)

Using the dataset

As a Python library

As JSON

As a website

Regenerating the dataset

Adding entries

Taxonomy mappings

Sources aggregated

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance