Skip to main content

Generate datasets amd models based on vulnerabilities data from Vulnerability-Lookup.

Project description

VulnTrain

Latest release License PyPi version

VulnTrain offers a suite of commands to generate diverse AI datasets and train models using comprehensive vulnerability data from Vulnerability-Lookup. It harnesses over one million JSON records from all supported advisory sources (CVE, GitHub advisories, CSAF, PySecDB, CNVD) to build high-quality, domain-specific models.

Additionally, data from the vulnerability-lookup:meta container, including enrichment sources such as vulnrichment and Fraunhofer FKIE, is incorporated to enhance model quality.

Check out the datasets and models on Hugging Face:

Model on HF

For more information about the use of AI in Vulnerability-Lookup, please refer to the user manual.

Installation

pipx install VulnTrain

For development:

git clone https://github.com/vulnerability-lookup/VulnTrain.git
cd VulnTrain/
poetry install

Usage

Three types of commands are available:

  • Dataset generation: Create and prepare datasets from vulnerability sources.
  • Model training: Train models using the prepared datasets.
  • Model validation: Assess the performance of trained models (validations, benchmarks, etc.).

CLI commands

Command Purpose
vulntrain-dataset-generation Generate datasets from vulnerability sources
vulntrain-train-severity-classification Train severity classifier (RoBERTa/DistilBERT)
vulntrain-train-severity-cnvd-classification Train severity classifier for CNVD data
vulntrain-train-description-generation Train GPT-2 vulnerability description generator
vulntrain-train-cwe-classification Train CWE classifier from patches
vulntrain-validate-severity-classification Validate severity model
vulntrain-validate-text-generation Validate text generation model

Models

  • Severity classification: Model on HF
  • Description generation: Model on HF

Distributed training on HPC clusters

VulnTrain supports distributed multi-GPU training via SLURM, making it suitable for EuroHPC-style GPU clusters. See the HPC documentation for Conda environment setup, single-node and multi-node SLURM job scripts, and NCCL configuration.

Documentation

Check out the full documentation for detailed usage instructions, dataset generation examples, and training recipes.

How to cite

Bonhomme, C., & Dulaunoy, A. (2025). VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (Version 1.4.0) [Computer software]. https://doi.org/10.48550/arXiv.2507.03607

@misc{bonhomme2025vlai,
    title={VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification},
    author={Cédric Bonhomme and Alexandre Dulaunoy},
    year={2025},
    eprint={2507.03607},
    archivePrefix={arXiv},
    primaryClass={cs.CR}
}

License

VulnTrain is licensed under GNU General Public License version 3

Copyright (c) 2025-2026 Computer Incident Response Center Luxembourg (CIRCL)
Copyright (C) 2025-2026 Cédric Bonhomme - https://github.com/cedricbonhomme
Copyright (C) 2025 Léa Ulusan - https://github.com/3LS3-1F

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vulntrain-3.1.0.tar.gz (267.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vulntrain-3.1.0-py3-none-any.whl (279.0 kB view details)

Uploaded Python 3

File details

Details for the file vulntrain-3.1.0.tar.gz.

File metadata

  • Download URL: vulntrain-3.1.0.tar.gz
  • Upload date:
  • Size: 267.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vulntrain-3.1.0.tar.gz
Algorithm Hash digest
SHA256 1b4d4cd6c7f7c63a380c5d058582b081bab1e8179dcdb4132b4d225a1c923c64
MD5 add0f29d2fcb6143a68bdcee94e1b72d
BLAKE2b-256 28da0675186995209cfcdf3a6cd1f17239669f69c52753478c8d1d48d5dfaae0

See more details on using hashes here.

Provenance

The following attestation bundles were made for vulntrain-3.1.0.tar.gz:

Publisher: release.yml on vulnerability-lookup/VulnTrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vulntrain-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: vulntrain-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 279.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vulntrain-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 93911f19c7facc805fd62199a919eb46639133f2b882c91064a9e619239bd1aa
MD5 6fa3efeb4f8f62a0db8ef5d99f8e3119
BLAKE2b-256 aec6aa08af7134380eae3e57c6099fde1a7a280c1aa3874bba113258b1bdde29

See more details on using hashes here.

Provenance

The following attestation bundles were made for vulntrain-3.1.0-py3-none-any.whl:

Publisher: release.yml on vulnerability-lookup/VulnTrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page