Extract, filter, and analyze CVE data from the official CVE List
Project description
CVE Extractor
Extract, filter, and analyze CVE data from the official CVE List.
Features
- Download & Extract: Automatically download and extract CVE data from the official CVE List
- Filter: Identify and filter language/ecosystem-specific CVEs from the full dataset
- Extract: Extract key information from CVE records including ID, type, and description
- Analyze: Generate statistics and analysis of CVE distribution
- Caching: Intelligent caching system to avoid redundant downloads and processing
Requirements
- uv for dependency and environment management.
Installation
Clone the repository and install with uv:
# Install the package with dependencies
uv sync
# Install with development dependencies
uv sync --all-extras
Run the CLI via uv run cve-extractor or ensure the project's virtual environment is activated so the cve-extractor script is on PATH.
Usage
All commands assume the project environment is active (e.g. after uv sync). Otherwise use uv run cve-extractor instead of cve-extractor.
Command Line Interface
Download and Extract CVE Data
Download the latest CVE data and extract CVEs for a given language/ecosystem:
# Basic usage (requires --language; uses cache if available)
cve-extractor download output/ --language php
# Force fresh download
cve-extractor download output/ --language python --no-use-cache
# Verbose output for debugging
cve-extractor download output/ --language php --verbose
Output: Creates output/collected.csv with extracted CVE data.
Analyze CVE Distribution
Generate statistics about CVE distribution:
# Basic analysis
cve-extractor analyze output/collected.csv
# Filter by minimum count
cve-extractor analyze output/collected.csv --min 5
Clean Cache
Remove cache and intermediate files:
# Interactive cleanup
cve-extractor clean
# Force cleanup without confirmation
cve-extractor clean --force
Project Structure
cve-extractor/
├── src/cve_extractor/ # Main package
│ ├── __init__.py
│ ├── config.py # Configuration management
│ ├── logger.py # Logging utilities
│ ├── cli.py # CLI interface
│ ├── core/ # Core functionality
│ │ ├── __init__.py
│ │ ├── downloader.py # CVE data download and extraction
│ │ ├── filter.py # CVE filtering by language
│ │ └── extractor.py # CVE information extraction
│ └── stats/ # Statistics and analysis
│ ├── __init__.py
│ └── analyzer.py # CVE distribution analysis
├── main.py # CLI entry point
├── pyproject.toml # Project configuration (uv)
└── README.md # This file
Configuration
Configuration is managed through src/cve_extractor/config.py. Default paths:
- CACHE_PATH:
data/.cache/- Stores downloaded CVE data - INTER_PATH:
data/.inter/- Stores intermediate files and logs - CVELISTV5_URL: Official CVE List v5 release URL
Output Format
CSV Output
The extracted CVE data is saved as CSV with the following columns:
| Column | Description |
|---|---|
cve_id |
CVE identifier (e.g., CVE-2024-1234) |
cve_type |
CVE type/classification |
description |
CVE description (first 200 chars) |
Example:
cve_id,cve_type,description
CVE-2024-1234,CWE-79,"Cross-site scripting (XSS) vulnerability in..."
CVE-2024-5678,CWE-89,"SQL injection vulnerability in..."
Logging
Logs are stored in data/.inter/.logs/ with the following format:
- File logs: Detailed format with timestamps and line numbers
- Console output: Formatted with colors and emojis for easy reading
Dependencies
All dependencies are declared in pyproject.toml and managed by uv.
- Core: requests, pydantic, typer, rich
- Optional [core]:
uv sync --extra core— API-only install - Optional [full]: default install includes CLI
- Optional [dev]:
uv sync --all-extras— pytest, black, pylint, isort, mypy
Development
Setup Development Environment
# Install with dev extras (uv manages the environment)
uv sync --all-extras
# Format code
uv run black src/
uv run isort src/
# Lint code
uv run ruff check src/
uv run pylint src/
# Type check
uv run mypy src/
# Run tests
uv run pytest
Code Style
This project uses uv for all tooling. Before committing, run:
uv run black src/anduv run isort src/for formattinguv run ruff check src/anduv run pylint src/for lintinguv run mypy src/for type checking
Performance Notes
The extraction process is optimized for performance:
- Batch processing: Processes CVE files in batches of 5000
- Progress tracking: Real-time progress display with ETA
- Caching: 7-day cache for extracted data and GitHub requests
- Incremental updates: Only processes new CVEs since last run
Troubleshooting
No CVEs Found
If no CVEs are found for the selected language:
- Check network connectivity
- Verify the CVE data source URL is accessible
- Try with
--no-use-cacheto force fresh download
Out of Memory
For large datasets:
- Reduce batch size in
src/cve_extractor/core/extractor.py - Run on a machine with more RAM
- Process data in smaller chunks
API Rate Limiting
The downloader includes automatic rate limit handling:
- Automatic retries with exponential backoff
- Caching to minimize API calls
- 7-day cache TTL
License
This project is open source and available under the MIT License.
Contributing
Contributions are welcome! Please ensure:
- Use uv for all commands; code follows project style (black, isort, ruff, pylint)
- Type hints are included where applicable
- Tests are added for new functionality
- Documentation is updated
Support
For issues, questions, or suggestions, please open an issue on the project repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cve_extractor-0.1.0.tar.gz.
File metadata
- Download URL: cve_extractor-0.1.0.tar.gz
- Upload date:
- Size: 92.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00039c7247a433fb3ae02f5d3068d43692e22c7f99ea773a79dca70afa448991
|
|
| MD5 |
e10f565a39bc5db2ae317ac5e59e5234
|
|
| BLAKE2b-256 |
bbe23d814496491eeb83bc64f00bfd98fb19528d65410671dc895d639e8a4117
|
File details
Details for the file cve_extractor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cve_extractor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09bb0844e2a61f6456334d02090e0acce227f47ab3a49ed5799e4844ca93f293
|
|
| MD5 |
c12c7dade86050def2315ea82f281edd
|
|
| BLAKE2b-256 |
858a90112c3dbba613496936343d344afaefe905bf35079770d138a3e20646be
|