Skip to main content

Creates a table with the presence/absence of genes in samples

Project description

Gene2Tab

Description

This script is used to convert the output of ABRICATE tabulated output into a matrix samples with the genes as columns and the presence/absence of the gene as the values.

The script can use as input a directory containing multiple .tab files or a single .tab file. The output will be a .csv file. Each file should contain the #File Sequence and GENE columns, the other ones are optional.

By default, the script will only consider genes with a coverage and identity of 90% or more. This can be changed with the --min_coverage and --min_identity flags.

Example:

File1.tab

#FILE	SEQUENCE	START	END	STRAND	GENE	COVERAGE	COVERAGE_MAP	GAPS	%COVERAGE	%IDENTITY	DATABASE	ACCESSION	PRODUCT	RESISTANCE
Isolate1	S1-length_513	53228	54445	+	ceoA	1-1218/1218	========/======	2/2	99.92	99.67	card	U97042:0-1218	ceoA is a periplasmic linker subunit of the CeoAB-OpcM efflux pump	aminoglycoside;fluoroquinolone
Isolate1	S1-length_513	54491	57575	+	ceoB	1-3084/3084	========/======	1/1	100	99.87	card	U97042:1263-4347	ceoB is a cytoplasmic membrane component of the CeoAB-OpcM efflux pump	aminoglycoside;fluoroquinolone
Isolate1	S1-length_513	57702	59240	+	opcM	1-1536/1536	========/======	5/5	99.93	99.16	card	U38944.1:0-1536	OpcM is an outer membrane factor protein found in Burkholderia cepacia. It is part of the CeoAB-OpcM complex.	aminoglycoside;fluoroquinolone
Isolate1	S1-length_233	145199	146378	+	amrA	24-1200/1200	========/======	15/23	97.25	80.84	card	BX571965.1:2152165-2150965	amrA is the efflux pump subunit of the AmrAB-OprM multidrug efflux complex. amrA corresponds to 1 locus in Pseudomonas aeruginosa PAO1 and 1 locus in Pseudomonas aeruginosa LESB58.	aminoglycoside
Isolate1	S1-length_233	146394	149503	+	amrB	1-3110/3132	========/======	2/2	99.27	88.17	card	BX571965.1:2150949-2147817	amrB is the membrane fusion protein of the AmrAB-OprM multidrug efflux complex.	aminoglycoside
Isolate1	S1-length_265	25329	26405	+	Burkholderia_pseudomallei_Omp38	1-1122/1122	========/======	14/59	95.37	80.78	card	AY312416:0-1122	Heterologous expression of Burkholderia pseudomallei Omp38 (BpsOmp38) in Omp-deficient E. coli host cells lowers their permeability and in consequence their antimicrobial susceptibility to penicillin G cefoxitin ceftazidime and imipenem.	carbapenem;cephalosporin;cephamycin;monobactam;penam;penem
Isolate2	S2-length_512	25329	26405	+	Burkholderia_pseudomallei_Omp38	1-1122/1122	========/======	14/59	95.37	80.78	card	AY312416:0-1122	Heterologous expression of Burkholderia pseudomallei Omp38 (BpsOmp38) in Omp-deficient E. coli host cells lowers their permeability and in consequence their antimicrobial susceptibility to penicillin G cefoxitin ceftazidime and imipenem.	carbapenem;cephalosporin;cephamycin;monobactam;penam;penem

File2.tab

#FILE	SEQUENCE	START	END	STRAND	GENE	COVERAGE	COVERAGE_MAP	GAPS	%COVERAGE	%IDENTITY	DATABASE	ACCESSION	PRODUCT	RESISTANCE
Isolate2	S2-length_512	25329	26405	+	Burkholderia_pseudomallei_Omp38	1-1122/1122	========/======	14/59	95.37	80.78	card	AY312416:0-1122	Heterologous expression of Burkholderia pseudomallei Omp38 (BpsOmp38) in Omp-deficient E. coli host cells lowers their permeability and in consequence their antimicrobial susceptibility to penicillin G cefoxitin ceftazidime and imipenem.	carbapenem;cephalosporin;cephamycin;monobactam;penam;penem

Will be converted to:

Sample,ceoA,ceoB,opcM,amrA,amrB,Burkholderia_pseudomallei_Omp38
BUR-BAB-IMI-102146,1,1,1,1,1,1
AA2,0,0,0,0,0,1

If the --transpose flag is used, the output will be:

Sample,Isolate1,Isolate2
ceoA,1,0
ceoB,1,0
opcM,1,0
amrA,1,0
amrB,1,0
Burkholderia_pseudomallei_Omp38,1,1

Installation

pip install gene2tab

Usage

See all the available options with:

gene2tab -h

Running the script. The input can be a single .tab file or a directory containing multiple .tab files. The output will be a .csv file.

gene2tab -i [output_directory or file.tab] -o output.csv --min_coverage 0.9 --min_identity 0.9 

You can also transpose the output with the --transpose flag.

gene2tab -i [output_directory or file.tab] -o output.csv --min_coverage 0.9 --min_identity 0.9 --transpose

If your files use a different delimiter than tab, you can specify it with the --input_file_delimiter flag.

gene2tab -i [output_directory or file.tab] -o output.csv --min_coverage 0.9 --min_identity 0.9 --input_file_delimiter ','

If you want a different delimiter in the output file, you can specify it with the --output_file_delimiter flag.

gene2tab -i [output_directory or file.tab] -o output.csv --min_coverage 0.9 --min_identity 0.9 --output_delimiter ';'

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gene2tab-0.1.5.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

gene2tab-0.1.5-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file gene2tab-0.1.5.tar.gz.

File metadata

  • Download URL: gene2tab-0.1.5.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for gene2tab-0.1.5.tar.gz
Algorithm Hash digest
SHA256 73f8b7ec0bf32575b8ee85f5a797dfd9bb674ecf0b01913046eefdc0f08c0ebc
MD5 9a9d7ed8f0c389dd6e5b8fb1daf8b04d
BLAKE2b-256 c99dcf4ce2dbead208eb5f0a7bcfa38c1506c96b6c4f817e4ecd726c4e547f9d

See more details on using hashes here.

File details

Details for the file gene2tab-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: gene2tab-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for gene2tab-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5cb551f3934785fc62bcaec8fd553a5b7f5ba5df20d97cc31118dcc0d3c3146b
MD5 b303227e332eba708862c4c32e88903d
BLAKE2b-256 9d6a3df00e06d7287e7a2c1bd853f0dfd32338938b83477319e6680ffbcfc2e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page