Skip to main content

Finds virulence genes in ngs samples based on databases for several organisms

Project description

VirulenceFinder documentation

This project documents the VirulenceFinder service VirulenceFinder identifies viruelnce genes in total or partial sequenced isolates of bacteria - at the moment E. coli, Enterococcus, S. aureus and Listeria are available.

Important if you are updating from a previous VirulenceFinder version

It is no longer recommended to clone the VirulenceFinder bitbucket repository unless you plan to do development work on VirulenceFinder.

Instead we recommend installing VirulenceFinder using pip as described below.

There are several good reasons why the recommended installation procedure has changed. Its easier for users. And it makes sure your installation will be a tested release of the application.

Installation

VirulenceFinder consists of an application and a database. The database can be used without the application, but not the other way around. Below VirulenceFinder, the application, will be installed first and then the database will be installed and configured to work with VirulenceFinder the application.

Dependencies

VirulenceFinder uses two external alignment tools that must be installed.

  • BLAST
  • KMA

BLAST

If you don't want to specify the path of BLAST every time you run VirulenceFinder, make sure that "blastn" is in you PATH or set the environment variable specified in the "Environment Variables Table" in this README.

Blastn can be obtained from:

https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
# Example of how to set the environment variable in the bash shell. Remember this is only temporary, if you want it set every time you log in you need to add this line to your .bashrc, .zshrc file.
export CGE_BLASTN="/path/to/some/dir/blastn"

KMA

If you don't want to specify the path of KMA every time you run VirulenceFinder, make sure that KMA is in you PATH or set the environment variable specified in the "Environment Variables Table" in this README.

KMA can be obtained from:

https://bitbucket.org/genomicepidemiology/kma.git
# Example of how to set the environment variable in the bash shell. Remember this is only temporary, if you want it set every time you log in you need to add this line to  your .bashrc, .zshrc file.
export CGE_KMA="/path/to/some/dir/kma/kma"

Install VirulenceFinder the application using pip

Important: This will install VirulenceFinder in the environment where you run pip and potenitally update the python modules VirulenceFinder depends on. It is recommended to run VirulenceFinder in its own environment, in order to avoid breaking existing installations and prevent VirulenceFinder from getting broken by future unrelated pip installations. This is described in the optional step below.

Optional: Create virtual environment

Go to the location where you want to store your environment.

# Create environment
python3 -m venv virulencefinder_env

# Activate environment
source vilencefinder_env/bin/activate

# When you are finished using Virulencefinder deactivate the environment
deactivate

Install VirulenceFinder

pip install virulencefinder

Databases

If you don't want to specify the path to the database every time you run VirulenceFinder, you need to set the environment variable specified in the "Environment Variables Table" in this README.

Go to the location where you want to store the database. Clone the datbases you need.

Note: We are currently working on hosting a tarballed version of the database that can be downloaded, so that cloning can be avoided.

git clone https://bitbucket.org/genomicepidemiology/virulencefinder_db/

Set temporary environment variables.

# Example of how to set the environment variable in the bash shell. Remember this is only temporary, if you want it set every time you log in you need to add this line to for example your .bashrc file.
export CGE_VIRULENCEFINDER_DB="/path/to/some/dir/virulencefinder_db"

Install VirulenceFinder with Docker

The VirulenceFinder application and the database has been build into a single image on docker hub named "genomicepidemiology/virulencefinder". Below is an example run, where the current working directory is bound to the container "/app" path which is the container working directory.

docker run -v "$(pwd):/app" genomicepidemiology/virulencefinder:3.2.0 -o test_out -d s.aureus_exoenzyme,s.aureus_hostimm,s.aureus_toxin -ifq tests/data/s_aureus/*.gz

Usage

You can run virulencefinder command line using python.

# Example of running virulencefinder
python -m virulencefinder -ifa data/test_isolate_01.fa -o "."

The program can be invoked with the -h option

options:

  -h, --help            show this help message and exit

  -ifa INPUTFASTA, --inputfasta INPUTFASTA
                        Input fasta file.

  -ifq INPUTFASTQ [INPUTFASTQ ...], --inputfastq INPUTFASTQ [INPUTFASTQ ...]
                        Input fastq file(s). Assumed to be single-end fastq if only one file is provided,
                        and assumed to be paired-end data if two files are provided.

  --nanopore
          If nanopore data is used

  -p DB_PATH, --databasePath DB_PATH
                        Path to the database

  -d DATABASES, --databases DATABASES
                        Databases chosen to search in - if non or all is    specified, all are used.

  -o OUTPUTPATH, --outputPath OUTPUTPATH
                        Output directory. If it doesnt exist, it will be created.

  -j OUT_JSON, --out_json OUT_JSON
                        Specify JSON filename and output directory.
                        If the directory doesn't exist, it will be created.

  -b BLASTPATH, --blastPath BLASTPATH
                        Path to blastn

  -k KMAPATH, --kmaPath KMAPATH
                        Path to KMA

  --speciesinfo_json SPECIES
                        Argument used by the cge pipeline. It takes a list in json format consisting of taxonomy, from
                          domain -> species. A database is chosen based on the taxonomy. Default is none.

  db_vir_kma KMA_DB, --db_path_vir_kma KMA_DB
                        Path to the virulencefinder databases indexed with
                          KMA. Defaults to the value of the --db_res flag.

  -l MIN_COV, --min_cov MIN_COV
                        Minimum (breadth-of) coverage of ResFinder within the range 0-1. Default is 0.60.

  -t THRESHOLD, --threshold THRESHOLD
                        Minimum threshold for identity of ResFinder within the range 0-1. Default is 0.90.

  -v, --version
              Show program's version of VirulenceFinder and exit

  --pickle
        Create a pickle dump of the Isolate object. Currently needed in the CGE webserver.
          Dependency and this option is being removed.

  -x, --extented_output
                    Give extented output with allignment files, template and query hits in fasta and a tab
                      seperated file with gene profile result

  -tmp TMP_DIR, , --tmp_dir TMP_DIR
                              Temporary directory for storage of the results from the external software. Defaults to 'tmp'
                                dir in the given output dir.

  -q, --quiet"     Don't show results

  --overlap OVERLAP_NUM
                    Genes are allowed to overlap this number of
                        nucleotides.

Environment Variables

Environment variables recognized by VirulenceFinder, the flag they replace and the default value for the flag. Provided commandline flags will always take precedence. Set environment variables takes precedence over default flag values.

Additional Environment variables can be added by appending entries to the file named "environment_variables.md".

Environment Variables Table

Environment Variables Table

Environment Variabel Flag Default Value
CGE_BLASTN blastPath blastn
CGE_VIRULENCEFINDER_DB db_path databases
CGE_VIRFINDER_GENE_COV min_cov 0.60
CGE_VIRFINDER_POINT_ID threshold 0.90
CGE_VIRFINDER_JSON speciesinfo_json None

Web-server

A webserver implementing the methods is available at the CGE website and can be found here: https://cge.food.dtu.dk/services/VirulenceFinder/

For developers:

Use pdm install

If needed, use pdm add DEPENDENCY (for instance, to update the cgecore library )

To run the tool: pdm run virulencefinder [OPTIONS] i.e pdm run virulencefinder -h

Citation

When using the method please cite:

Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli. Joensen KG, Scheutz F, Lund O, Hasman H, Kaas RS, Nielsen EM, Aarestrup FM. J. Clin. Micobiol. 2014. 52(5): 1501-1510. [Epub ahead of print]

References

  1. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics 2009; 10:421.
  2. Clausen PTLC, Aarestrup FM, Lund O. Rapid and precise alignment of raw reads against redundant databases with KMA. BMC Bioinformatics 2018; 19:307.

License

Copyright (c) 2025, DTU Food™ All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virulencefinder-3.2.0.tar.gz (78.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

virulencefinder-3.2.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file virulencefinder-3.2.0.tar.gz.

File metadata

  • Download URL: virulencefinder-3.2.0.tar.gz
  • Upload date:
  • Size: 78.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.1

File hashes

Hashes for virulencefinder-3.2.0.tar.gz
Algorithm Hash digest
SHA256 545d07476e0a18ed1428462e616696d5c0073b11f2200bc0a1696a72f3c7d868
MD5 505444d1552c67713f82eb9662472a20
BLAKE2b-256 7dd0aa79c397c11ea1978688a94ea7b992f6198fec68606e23c32db7892e674d

See more details on using hashes here.

File details

Details for the file virulencefinder-3.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for virulencefinder-3.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 841741d21cc889b7c4565066565bc40fdecd3e1f13bb3fef6d23762a5fde837a
MD5 364233d8e215a2903cbe5aa553de2071
BLAKE2b-256 95cd3932537d1bfaebadfb151da027e152e686f58f46fcff09d2dbbe135cf9bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page