Skip to main content

GENIAL: GENes Indentification with Abricate for Lucky biologists

Project description

GENIAL : GENes Identification with Abricate for Lucky biologists

Authors : Barbet Pauline, Felten Arnaud

Affiliation: Food Safety Laboratory - ANSES Maisons Alfort (France)

You can find the latest version of the tool at https://github.com/p-barbet/GENIAL

GENIAL

GENIAL aims to identify antimicrobial resistance and virulence genes from bacterial genomes matching them to a database gathering genes of interest using ABRicate.

Databases

Default databases available are (Resfinder, CARD, ARG-ANNOT, NCBI, EcOH, PlasmidFinder, Ecoli_VF and VFDB)

As well as this databes, it's posible to use your own database.

The tool is divided into two scripts.

GENIALanalysis

GENIALanalysis aims to run ABricate. It takes in input a tsv file containing genomes fasta files paths and IDs.If you want to use your own database you also need to provide a multifasta whith genes IDs as headers. Then the script run ABricate and produce in output one ABRicate result file per genome, corresponding to a tsv file including genes found in each sample.

GENIALresults

GENIALresults aims to conditionning ABRicate results in the form of matrixes and heatmaps of presence/absence. It takes in input a temporary file produced by the Abricate analysis containing the genomes Abricate results paths and IDs. In the case of vfdb database a file containing the virulence factors names, their family and species is automticaly included in the script.

The output depending on the database used :

  • In any cases a matrix in tsv format and a heatmap in png format with all genes found are created

On top of that:

  • If you use one of the default databases Resfinder or VFDB news matrix and heatmap by gene type are produced with a correspondace table between the gene name, its family and its number in all genomes.

  • If you don't use one of the two previous databases or if you use your own database, only a corespondance table between the gene name and its number in all genomes is produced in addition.

Dependencies

The script has been developed with python 3.6 (tested with 3.6.6)

External dependencies

Parameters

Command line options

Options Description Required Default
-f tsv file with FASTA files paths ans strains IDs Yes
-dbp Path to ABRicate databases repertory. Implies -dbf and --privatedb Yes if --privatedb
-dbf Multifasta containing the private database sequences. Implies -dbp and --privatedb Yes if --privatedb
-T Number of thread to use No 1
-w Working directory No .
-r Results directory name No ABRicate_results
--defaultdb default databases available : resfinder, card, argannot, acoh, ecoli_vf, plasmidfinder, vfdb or ncbi. Incompatible with --privatedb Yes if not --privatedb
--privatedb Private database name. Implies -dbp and -dbf. Incompatible with --defaultdb Yes if not --defaultdb
--mincov Minimum proportion of gene covered No 80
--minid Minimum proportion of exact nucleotide matches No 90
--R Remove genes present in all genomes from the matrix No False

Test

After installing ABRicate and Pandas and seaborn you can test the script with the command line :

Default database

python AntiViruce.py -f input_file.tsv --defaultdb vfdb -r results_directory --minid 90 --mincov 80

Private database

python AntiViruce.py -f input_file.tsv  --privatedb private_db_name -T 10 -r results_directory --minid 90 --mincov 80 -dbp path_to_abricate_databases_repertory -dbf private_db_multifasta_path

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GENIALbiologists-0.9.0.tar.gz (12.8 kB view hashes)

Uploaded source

Built Distribution

GENIALbiologists-0.9.0-py3-none-any.whl (27.0 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page