Skip to main content

GENIAL: GENes Indentification with Abricate for Lucky biologists

Project description

GENIAL : GENes Identification with Abricate for Lucky biologists

Authors : Barbet Pauline, Felten Arnaud

Affiliation: Food Safety Laboratory - ANSES Maisons Alfort (France)

You can find the latest version of the tool at https://github.com/p-barbet/GENIAL

GENIAL

GENIAL aims to identify antimicrobial resistance and virulence genes from bacterial genomes matching them to a database gathering genes of interest using ABRicate.

Databases

Default databases available are (Resfinder, CARD, ARG-ANNOT, NCBI, EcOH, PlasmidFinder, Ecoli_VF and VFDB)

As well as this databes, it's posible to use your own database.

The tool is divided into two scripts.

GENIALanalysis

GENIALanalysis aims to run ABricate. It takes in input a tsv file containing genomes fasta files paths and IDs.If you want to use your own database you also need to provide a multifasta whith genes IDs as headers. Then the script run ABricate and produce in output one ABRicate result file per genome, corresponding to a tsv file including genes found in each sample.

GENIALresults

GENIALresults aims to conditionning ABRicate results in the form of matrixes and heatmaps of presence/absence. It takes in input a temporary file produced by the Abricate analysis containing the genomes Abricate results paths and IDs. In the case of vfdb database a file containing the virulence factors names, their family and species is automticaly included in the script.

The output depending on the database used :

  • In any cases a matrix in tsv format and a heatmap in png format with all genes found are created

On top of that:

  • If you use one of the default databases Resfinder or VFDB news matrix and heatmap by gene type are produced with a correspondace table between the gene name, its family and its number in all genomes.

  • If you don't use one of the two previous databases or if you use your own database, only a corespondance table between the gene name and its number in all genomes is produced in addition.

Dependencies

The script has been developed with python 3.6 (tested with 3.6.6)

External dependencies

Parameters

Command line options

Options Description Required Default
-f tsv file with FASTA files paths ans strains IDs Yes
-dbp Path to ABRicate databases repertory. Implies -dbf and --privatedb Yes if --privatedb
-dbf Multifasta containing the private database sequences. Implies -dbp and --privatedb Yes if --privatedb
-T Number of thread to use No 1
-w Working directory No .
-r Results directory name No ABRicate_results
--defaultdb default databases available : resfinder, card, argannot, acoh, ecoli_vf, plasmidfinder, vfdb or ncbi. Incompatible with --privatedb Yes if not --privatedb
--privatedb Private database name. Implies -dbp and -dbf. Incompatible with --defaultdb Yes if not --defaultdb
--mincov Minimum proportion of gene covered No 80
--minid Minimum proportion of exact nucleotide matches No 90
--R Remove genes present in all genomes from the matrix No False

Test

After installing ABRicate and Pandas and seaborn you can test the script with the command line :

Default database

python AntiViruce.py -f input_file.tsv --defaultdb vfdb -r results_directory --minid 90 --mincov 80

Private database

python AntiViruce.py -f input_file.tsv  --privatedb private_db_name -T 10 -r results_directory --minid 90 --mincov 80 -dbp path_to_abricate_databases_repertory -dbf private_db_multifasta_path

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GENIALbiologists-0.9.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

GENIALbiologists-0.9.0-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file GENIALbiologists-0.9.0.tar.gz.

File metadata

  • Download URL: GENIALbiologists-0.9.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for GENIALbiologists-0.9.0.tar.gz
Algorithm Hash digest
SHA256 9376d696e78349e342ea88eb6b391eb05dea0e2d0e51cdfe3f352cff64586eb4
MD5 718785ea87d2f40e933642c83a33c66e
BLAKE2b-256 7565042d99504140e45fae8e24004d93dc3db862bd8a83063ce7452ac1b3200b

See more details on using hashes here.

File details

Details for the file GENIALbiologists-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: GENIALbiologists-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for GENIALbiologists-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 20cb8d33ebef6a8240b4d61edd89ffa384c9eb422cfe0c3a9bf767039bd3b9b6
MD5 4d7e8f5ffaab55009d52267fddaac81a
BLAKE2b-256 3f2123a3db9ceef6171f0a4fe3744c25ef87e24c71291a40a3e3c88ed6f9c813

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page