Generate datasets amd models based on vulnerabilities data from Vulnerability-Lookup.

These details have been verified by PyPI

Project links

Owner

CIRCL

GitHub Statistics

These details have not been verified by PyPI

Project description

VulnTrain

VulnTrain offers a suite of commands to generate diverse AI datasets and train models using comprehensive vulnerability data from Vulnerability-Lookup. It harnesses over one million JSON records from all supported advisory sources (CVE, GitHub advisories, CSAF, PySecDB, CNVD) to build high-quality, domain-specific models.

Additionally, data from the vulnerability-lookup:meta container, including enrichment sources such as vulnrichment and Fraunhofer FKIE, is incorporated to enhance model quality.

Check out the datasets and models on Hugging Face:

For more information about the use of AI in Vulnerability-Lookup, please refer to the user manual.

Installation

pipx install VulnTrain

For development:

git clone https://github.com/vulnerability-lookup/VulnTrain.git
cd VulnTrain/
poetry install

Usage

Three types of commands are available:

Dataset generation: Create and prepare datasets from vulnerability sources.
Model training: Train models using the prepared datasets.
Model validation: Assess the performance of trained models (validations, benchmarks, etc.).

CLI commands

Command	Purpose
`vulntrain-dataset-generation`	Generate datasets from vulnerability sources
`vulntrain-train-severity-classification`	Train severity classifier (RoBERTa/DistilBERT)
`vulntrain-train-severity-cnvd-classification`	Train severity classifier for CNVD data
`vulntrain-train-description-generation`	Train GPT-2 vulnerability description generator
`vulntrain-train-cwe-classification`	Train CWE classifier from patches
`vulntrain-validate-severity-classification`	Validate severity model
`vulntrain-validate-text-generation`	Validate text generation model

Models

Severity classification:
Description generation:

Distributed training on HPC clusters

VulnTrain supports distributed multi-GPU training via SLURM, making it suitable for EuroHPC-style GPU clusters. See the HPC documentation for Conda environment setup, single-node and multi-node SLURM job scripts, and NCCL configuration.

Documentation

Check out the full documentation for detailed usage instructions, dataset generation examples, and training recipes.

How to cite

Bonhomme, C., & Dulaunoy, A. (2025). VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (Version 1.4.0) [Computer software]. https://doi.org/10.48550/arXiv.2507.03607

@misc{bonhomme2025vlai,
    title={VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification},
    author={Cédric Bonhomme and Alexandre Dulaunoy},
    year={2025},
    eprint={2507.03607},
    archivePrefix={arXiv},
    primaryClass={cs.CR}
}

License

VulnTrain is licensed under GNU General Public License version 3

Copyright (c) 2025-2026 Computer Incident Response Center Luxembourg (CIRCL)
Copyright (C) 2025-2026 Cédric Bonhomme - https://github.com/cedricbonhomme
Copyright (C) 2025 Léa Ulusan - https://github.com/3LS3-1F

Project details

These details have been verified by PyPI

Project links

Owner

CIRCL

GitHub Statistics

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

3.1.0

Apr 6, 2026

3.0.0

Apr 3, 2026

2.2.0

Feb 19, 2026

2.1.0

Nov 18, 2025

2.0.0

Sep 5, 2025

1.5.0

Jul 25, 2025

1.4.0

Jul 1, 2025

1.3.1

Apr 28, 2025

1.3.0

Apr 28, 2025

1.2.0

Mar 11, 2025

1.1.0

Feb 27, 2025

1.0.0

Feb 25, 2025

0.5.1

Feb 21, 2025

0.5.0

Feb 21, 2025

0.4.0

Feb 21, 2025

0.3.0

Feb 20, 2025

0.2.0

Feb 20, 2025

0.1.0

Feb 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vulntrain-3.1.0.tar.gz (267.0 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vulntrain-3.1.0-py3-none-any.whl (279.0 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file vulntrain-3.1.0.tar.gz.

File metadata

Download URL: vulntrain-3.1.0.tar.gz
Upload date: Apr 6, 2026
Size: 267.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vulntrain-3.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1b4d4cd6c7f7c63a380c5d058582b081bab1e8179dcdb4132b4d225a1c923c64`
MD5	`add0f29d2fcb6143a68bdcee94e1b72d`
BLAKE2b-256	`28da0675186995209cfcdf3a6cd1f17239669f69c52753478c8d1d48d5dfaae0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vulntrain-3.1.0.tar.gz:

Publisher: release.yml on vulnerability-lookup/VulnTrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vulntrain-3.1.0.tar.gz
- Subject digest: 1b4d4cd6c7f7c63a380c5d058582b081bab1e8179dcdb4132b4d225a1c923c64
- Sigstore transparency entry: 1242560546
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: vulnerability-lookup/VulnTrain@b3e874a403517432528548b745bd59631b81efc2
- Branch / Tag: refs/tags/v3.1.0
- Owner: https://github.com/vulnerability-lookup
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b3e874a403517432528548b745bd59631b81efc2
- Trigger Event: release

File details

Details for the file vulntrain-3.1.0-py3-none-any.whl.

File metadata

Download URL: vulntrain-3.1.0-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 279.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vulntrain-3.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`93911f19c7facc805fd62199a919eb46639133f2b882c91064a9e619239bd1aa`
MD5	`6fa3efeb4f8f62a0db8ef5d99f8e3119`
BLAKE2b-256	`aec6aa08af7134380eae3e57c6099fde1a7a280c1aa3874bba113258b1bdde29`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vulntrain-3.1.0-py3-none-any.whl:

Publisher: release.yml on vulnerability-lookup/VulnTrain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vulntrain-3.1.0-py3-none-any.whl
- Subject digest: 93911f19c7facc805fd62199a919eb46639133f2b882c91064a9e619239bd1aa
- Sigstore transparency entry: 1242560557
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: vulnerability-lookup/VulnTrain@b3e874a403517432528548b745bd59631b81efc2
- Branch / Tag: refs/tags/v3.1.0
- Owner: https://github.com/vulnerability-lookup
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b3e874a403517432528548b745bd59631b81efc2
- Trigger Event: release

VulnTrain 3.1.0

Navigation

Verified details

Project links

Owner

GitHub Statistics

Unverified details

Meta

Classifiers

Project description

VulnTrain

Installation

Usage

CLI commands

Models

Distributed training on HPC clusters

Documentation

How to cite

License

Project details

Verified details

Project links

Owner

GitHub Statistics

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance