Skip to main content

Generate datasets amd models based on vulnerabilities data from Vulnerability-Lookup.

Project description

VulnTrain

Latest release License PyPi version

VulnTrain offers a suite of commands to generate diverse AI datasets and train models using comprehensive vulnerability data from Vulnerability-Lookup. It harnesses over one million JSON records from all supported advisory sources (CVE, GitHub advisories, CSAF, PySecDB, CNVD) to build high-quality, domain-specific models.

Additionally, data from the vulnerability-lookup:meta container, including enrichment sources such as vulnrichment and Fraunhofer FKIE, is incorporated to enhance model quality.

Check out the datasets and models on Hugging Face:

Model on HF

For more information about the use of AI in Vulnerability-Lookup, please refer to the user manual.

Installation

pipx install VulnTrain

For development:

git clone https://github.com/vulnerability-lookup/VulnTrain.git
cd VulnTrain/
poetry install

Usage

Three types of commands are available:

  • Dataset generation: Create and prepare datasets from vulnerability sources.
  • Model training: Train models using the prepared datasets.
  • Model validation: Assess the performance of trained models (validations, benchmarks, etc.).

CLI commands

Command Purpose
vulntrain-dataset-generation Generate datasets from vulnerability sources
vulntrain-train-severity-classification Train severity classifier (RoBERTa/DistilBERT)
vulntrain-train-severity-cnvd-classification Train severity classifier for CNVD data
vulntrain-train-description-generation Train GPT-2 vulnerability description generator
vulntrain-train-cwe-classification Train CWE classifier from patches
vulntrain-validate-severity-classification Validate severity model
vulntrain-validate-text-generation Validate text generation model

Models

  • Severity classification: Model on HF
  • Description generation: Model on HF

Distributed training on HPC clusters

VulnTrain supports distributed multi-GPU training via SLURM, making it suitable for EuroHPC-style GPU clusters. See the HPC documentation for Conda environment setup, single-node and multi-node SLURM job scripts, and NCCL configuration.

Documentation

Check out the full documentation for detailed usage instructions, dataset generation examples, and training recipes.

How to cite

Bonhomme, C., & Dulaunoy, A. (2025). VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (Version 1.4.0) [Computer software]. https://doi.org/10.48550/arXiv.2507.03607

@misc{bonhomme2025vlai,
    title={VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification},
    author={Cédric Bonhomme and Alexandre Dulaunoy},
    year={2025},
    eprint={2507.03607},
    archivePrefix={arXiv},
    primaryClass={cs.CR}
}

License

VulnTrain is licensed under GNU General Public License version 3

Copyright (c) 2025-2026 Computer Incident Response Center Luxembourg (CIRCL)
Copyright (C) 2025-2026 Cédric Bonhomme - https://github.com/cedricbonhomme
Copyright (C) 2025 Léa Ulusan - https://github.com/3LS3-1F

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vulntrain-3.0.0.tar.gz (266.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vulntrain-3.0.0-py3-none-any.whl (276.6 kB view details)

Uploaded Python 3

File details

Details for the file vulntrain-3.0.0.tar.gz.

File metadata

  • Download URL: vulntrain-3.0.0.tar.gz
  • Upload date:
  • Size: 266.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vulntrain-3.0.0.tar.gz
Algorithm Hash digest
SHA256 6872d7df408da2c32602d85f40a933c89ecdf0ddd5bdaf003170a8f2445f2f27
MD5 6934fe24fa11d2da383da9a27a314383
BLAKE2b-256 59dd8ee8d7fcde49833791049f877fcf133565e4891f00390fba55615a136f65

See more details on using hashes here.

Provenance

The following attestation bundles were made for vulntrain-3.0.0.tar.gz:

Publisher: release.yml on vulnerability-lookup/VulnTrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vulntrain-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: vulntrain-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 276.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vulntrain-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f340447db08e592abb3870a11ac125812734bfa204adccb1f223f926e07d556f
MD5 7b3d5e9911b19c3e9dae3b3c14ae3079
BLAKE2b-256 dd4696149a8c8361e630b0efdb16e6455c834c87757165f278656fbd7c057cbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for vulntrain-3.0.0-py3-none-any.whl:

Publisher: release.yml on vulnerability-lookup/VulnTrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page