Skip to main content

Extract, filter, and analyze CVE data from the official CVE List

Project description

CVE Extractor

Extract, filter, and analyze CVE data from the official CVE List.

Features

  • Download & Extract: Automatically download and extract CVE data from the official CVE List
  • Filter: Identify and filter language/ecosystem-specific CVEs from the full dataset
  • Extract: Extract key information from CVE records including ID, type, and description
  • Analyze: Generate statistics and analysis of CVE distribution
  • Caching: Intelligent caching system to avoid redundant downloads and processing

Requirements

  • uv for dependency and environment management.

Installation

Clone the repository and install with uv:

# Install the package with dependencies
uv sync

# Install with development dependencies
uv sync --all-extras

Run the CLI via uv run cve-extractor or ensure the project's virtual environment is activated so the cve-extractor script is on PATH.

Usage

All commands assume the project environment is active (e.g. after uv sync). Otherwise use uv run cve-extractor instead of cve-extractor.

Command Line Interface

Download and Extract CVE Data

Download the latest CVE data and extract CVEs for a given language/ecosystem:

# Basic usage (requires --language; uses cache if available)
cve-extractor download output/ --language php

# Force fresh download
cve-extractor download output/ --language python --no-use-cache

# Verbose output for debugging
cve-extractor download output/ --language php --verbose

Output: Creates output/collected.csv with extracted CVE data.

Analyze CVE Distribution

Generate statistics about CVE distribution:

# Basic analysis
cve-extractor analyze output/collected.csv

# Filter by minimum count
cve-extractor analyze output/collected.csv --min 5

Clean Cache

Remove cache and intermediate files:

# Interactive cleanup
cve-extractor clean

# Force cleanup without confirmation
cve-extractor clean --force

Project Structure

cve-extractor/
├── src/cve_extractor/          # Main package
│   ├── __init__.py
│   ├── config.py               # Configuration management
│   ├── logger.py               # Logging utilities
│   ├── cli.py                  # CLI interface
│   ├── core/                   # Core functionality
│   │   ├── __init__.py
│   │   ├── downloader.py       # CVE data download and extraction
│   │   ├── filter.py           # CVE filtering by language
│   │   └── extractor.py        # CVE information extraction
│   └── stats/                  # Statistics and analysis
│       ├── __init__.py
│       └── analyzer.py         # CVE distribution analysis
├── main.py                      # CLI entry point
├── pyproject.toml               # Project configuration (uv)
└── README.md                    # This file

Configuration

Configuration is managed through src/cve_extractor/config.py. Default paths:

  • CACHE_PATH: data/.cache/ - Stores downloaded CVE data
  • INTER_PATH: data/.inter/ - Stores intermediate files and logs
  • CVELISTV5_URL: Official CVE List v5 release URL

Output Format

CSV Output

The extracted CVE data is saved as CSV with the following columns:

Column Description
cve_id CVE identifier (e.g., CVE-2024-1234)
cve_type CVE type/classification
description CVE description (first 200 chars)

Example:

cve_id,cve_type,description
CVE-2024-1234,CWE-79,"Cross-site scripting (XSS) vulnerability in..."
CVE-2024-5678,CWE-89,"SQL injection vulnerability in..."

Logging

Logs are stored in data/.inter/.logs/ with the following format:

  • File logs: Detailed format with timestamps and line numbers
  • Console output: Formatted with colors and emojis for easy reading

Dependencies

All dependencies are declared in pyproject.toml and managed by uv.

  • Core: requests, pydantic, typer, rich
  • Optional [core]: uv sync --extra core — API-only install
  • Optional [full]: default install includes CLI
  • Optional [dev]: uv sync --all-extras — pytest, black, pylint, isort, mypy

Development

Setup Development Environment

# Install with dev extras (uv manages the environment)
uv sync --all-extras

# Format code
uv run black src/
uv run isort src/

# Lint code
uv run ruff check src/
uv run pylint src/

# Type check
uv run mypy src/

# Run tests
uv run pytest

Code Style

This project uses uv for all tooling. Before committing, run:

  • uv run black src/ and uv run isort src/ for formatting
  • uv run ruff check src/ and uv run pylint src/ for linting
  • uv run mypy src/ for type checking

Performance Notes

The extraction process is optimized for performance:

  • Batch processing: Processes CVE files in batches of 5000
  • Progress tracking: Real-time progress display with ETA
  • Caching: 7-day cache for extracted data and GitHub requests
  • Incremental updates: Only processes new CVEs since last run

Troubleshooting

No CVEs Found

If no CVEs are found for the selected language:

  1. Check network connectivity
  2. Verify the CVE data source URL is accessible
  3. Try with --no-use-cache to force fresh download

Out of Memory

For large datasets:

  1. Reduce batch size in src/cve_extractor/core/extractor.py
  2. Run on a machine with more RAM
  3. Process data in smaller chunks

API Rate Limiting

The downloader includes automatic rate limit handling:

  • Automatic retries with exponential backoff
  • Caching to minimize API calls
  • 7-day cache TTL

License

This project is open source and available under the MIT License.

Contributing

Contributions are welcome! Please ensure:

  1. Use uv for all commands; code follows project style (black, isort, ruff, pylint)
  2. Type hints are included where applicable
  3. Tests are added for new functionality
  4. Documentation is updated

Support

For issues, questions, or suggestions, please open an issue on the project repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cve_extractor-0.1.0.tar.gz (92.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cve_extractor-0.1.0-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file cve_extractor-0.1.0.tar.gz.

File metadata

  • Download URL: cve_extractor-0.1.0.tar.gz
  • Upload date:
  • Size: 92.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for cve_extractor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 00039c7247a433fb3ae02f5d3068d43692e22c7f99ea773a79dca70afa448991
MD5 e10f565a39bc5db2ae317ac5e59e5234
BLAKE2b-256 bbe23d814496491eeb83bc64f00bfd98fb19528d65410671dc895d639e8a4117

See more details on using hashes here.

File details

Details for the file cve_extractor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cve_extractor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for cve_extractor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 09bb0844e2a61f6456334d02090e0acce227f47ab3a49ed5799e4844ca93f293
MD5 c12c7dade86050def2315ea82f281edd
BLAKE2b-256 858a90112c3dbba613496936343d344afaefe905bf35079770d138a3e20646be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page