tax2peptide creates based on given taxon IDs and a reference database a taxon specific database in fasta format.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

tax2peptide

tax2peptide creates based on given taxon IDs and a reference database a taxon specific database in fasta format. This taxon specific database contains according to the selected options all fasta entries of:

given taxon IDs and their descendant taxon IDs in the phylogenetic tree
given taxon IDs (option --no_descendants)
given taxon IDs adapted to specified level up in the phylogenetic tree and their descendant taxon IDs (option --level)
given taxon IDs and their descendant taxon IDs in the phylogenetic tree until level species, taxonIDs with lower level are not included (option --species)

Databases from which the matching entries are read out are: the NCBI non-redundant peptide database, swissprot, uniprot or trembl database. Also user-defined databases are possible as long as the header of the database contains taxonIDs in form of "OX=NUMBER" or contains NCBI/uniprot accession numbers. Using uncompressed databases speed up the program considerably.

Getting Started

Tax2Peptide is a python3 command line tool. It can be installed as pypi-package or as conda-package (https.anaconda.org/jschmacht/tax2peptide).

Prerequisites

Python3

Installing

pip install tax2peptide

pip install -i https://pypi.org/simple/tax2peptide-jschmacht

Deployment

Tax2Peptide is a command line tool and starts with:

python -m tax2peptide [options]

Options:

	option	description
-i	--input	TaxID input file: tabular file containing a column of NCBI taxon IDs. Columns tab separated.
-c	--column	The column (zero-based) in the tabular file that contains Taxon IDs. Default = 0.
-t	--taxon	NCBI taxon ID/s for database extraction. Multiple taxonIDs seperated by space.
-d	--database	Database choice for analysis or for download. Choices: ncbi, uniprot, tremble, swissprot.
-p	--path	Path to folder with all required databases: taxdump.tar.gz (for all databases), prot.accession2taxid or prot.accession2taxid.gz and pdb.accession2taxid.gz (for ncbi databases). Optional: peptide_database named: nr/nr.gz or uniprot_trembl.fasta/uniprot_trembl.fasta.gz or uniprot_sprot.fasta/uniprot_sprot.fasta.gz or uniprot.fasta./uniprot.fasta.gz
-o	--out	File name and direction of the result taxon specified peptide database. Default = /taxon_specified_db_DATE/taxon_database.fasta
-n	--dbname	Database name and direction. If database is in other folder than --path or name deviates from standard names
-l	--level	Hierarchy level up in anchestral tree. Choices: species, section, genus, tribe, subfamily, family, superfamily, order, superorder, class, phylum, kingdom, superkingdom
-r	--non_redundant	Makes the final database non redundant in regard to sequences, headers are concatenated.
-z	--no_descendants	Select peptide database only by given taxon IDs, descendant taxons are excluded.
-s	--species	Select peptide database only until taxonomic level "species", descendants from species are excluded.
-u	--threads	Number of threads for using multiprocessing. Default = number of cores.
-x	--reduce_header	Reduce the long headers of NCBI entries to accession IDs. Use only for NCBI databases.

Dependencies:

Required databases for generation of taxon specific databases from NCBI reference database

protaccession2tax.gz / protaccession2tax
pdbaccession2tax.gz
taxdump.tar.gz
nr.gz / nr

Required databases for generation of taxon specific databases from uniprot/swissprot/trembl reference database:

taxdump.tar.gz
uniprot.fasta.gz / uniprot.fasta / uniprot_sprot.fasta.gz / uniprot_sprot.fasta / uniprot_trembl.fasta.gz / uniprot_trembl.fasta

All database files should be downloaded the same day and stored in the same folder.

Databases

All databases should be downloaded at the same date as the peptide database to ensure successful accession matching. The database can be downloaded manually or downloaded by tax2peptide with option --database {ncbi, uniprot, trembl, swissprot}

database name	description	source	adress
NCBI	non redundant peptide database	NCBI	ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz
Swissprot	curated peptide database	Uniprot	ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
Trembl	peptide database	Uniprot	ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz
Uniprot	concatenated swissprot and trembl database	Uniprot
prot.accession2taxid	contain links between accession IDs and taxonomic lineage (taxon IDs)	NCBI	ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz
pdb.accession2taxid	contain links between accession IDs and taxonomic lineage (taxon IDs)	NCBI	ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/pdb.accession2taxid.gz
taxdump	tar-gz-compressed taxdump file containing information about the phylogenetic lineage and links between taxIDs and scientific names etc.	NCBI	ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz

Usage of tax2peptide with database download:

All needed databases will be downloaded to specified path (option --path). If not specified a folder with name databases_DATE will be used as default.

Examples of usage:

python -m tax2peptide -d uniprot -i path/to/input/taxon_ID_file  -> new Folder databases_DATE with: taxdump.tar.gz, uniprot.fasta

python -m tax2peptide -i path/to/input/taxon_ID_file  -> new Folder databases_DATE with: taxdump.tar.gz, uniprot.fasta

python -m tax2peptide -d ncbi -p path/to/my_new_databases -i path/to/input/taxon_ID_file  -> new Folder/used Folder my_new_databases with: protaccession2tax.gz, pdbaccession2tax.gz, taxdump.tar.gz, nr.gz

Usage of tax2peptide if all database files are already downloaded:

positional arguments: --path determines folder with all needed databases positional arguments: --taxon AND/OR --input at least one taxon ID or taxon ID input file must be provided optional arguments: --dbname determines location/name of database (if reference database is not in --path or have different name (see table for standard names)

--path is beeing checked for all required database files and missing databases are downloaded.

Examples of usage:

python -m tax2peptide -p path/to/folder -n path/to/reference_database -t 11111 22222 -o path/my_taxon_specified_database.fasta

python -m tax2peptide -p path/to/folder -n path/ to/ uniprot.fasta -t 11111 22222 -i path/to/input

python -m tax2peptide -d ncbi -p path/to/folder -i path/to/input

python -m tax2peptide -d uniprot -p path/to/folder -i path/to/input -o path/to/user_specified_db.fasta

If path is once determined, it must not be specified again, as long as the same folder shell be used.

Authors

Juliane Schmachtenberg

project_on_github

License

This project is licensed under the MIT License - see the LICENSE file for details

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.21

Jan 20, 2020

0.0.20

Jan 17, 2020

0.0.19

Jan 17, 2020

0.0.18

Jan 17, 2020

0.0.17

Jan 17, 2020

0.0.16

Jan 13, 2020

0.0.15

Jan 3, 2020

0.0.14

Nov 26, 2019

0.0.13

Nov 26, 2019

0.0.12

Nov 26, 2019

0.0.11

Nov 15, 2019

0.0.10

Nov 15, 2019

0.0.9

Nov 15, 2019

0.0.8

Nov 13, 2019

0.0.7

Nov 6, 2019

0.0.6

Nov 6, 2019

0.0.5

Nov 5, 2019

0.0.4

Nov 5, 2019

0.0.3

Nov 5, 2019

0.0.2

Nov 5, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tax2peptide-0.0.21.tar.gz (20.9 kB view details)

Uploaded Jan 20, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tax2peptide-0.0.21-py3-none-any.whl (23.5 kB view details)

Uploaded Jan 20, 2020 Python 3

File details

Details for the file tax2peptide-0.0.21.tar.gz.

File metadata

Download URL: tax2peptide-0.0.21.tar.gz
Upload date: Jan 20, 2020
Size: 20.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.7

File hashes

Hashes for tax2peptide-0.0.21.tar.gz
Algorithm	Hash digest
SHA256	`679e32efa3cccc1c1c44754bdc0cbfb757285f352ffc83297db3ba8a25d2e397`
MD5	`ccebe7b75564cfb54fe5388487340b4f`
BLAKE2b-256	`77be35d1012e0f165af2285cf6d6ac45c427991cd606d49cff9bccea7fddb4f9`

See more details on using hashes here.

File details

Details for the file tax2peptide-0.0.21-py3-none-any.whl.

File metadata

Download URL: tax2peptide-0.0.21-py3-none-any.whl
Upload date: Jan 20, 2020
Size: 23.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.7

File hashes

Hashes for tax2peptide-0.0.21-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2fc837eefc6046f913723da058732e22bde8d16c0ff532d1c121b497dbec09d`
MD5	`19d86a413c9567f9a8fc361d77a68b0c`
BLAKE2b-256	`a44bec6d631ec83f6bce40c414536e8da0ecad15b2d7696972e853c65406da12`

See more details on using hashes here.

tax2peptide 0.0.21

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tax2peptide

Getting Started

Prerequisites

Installing

Deployment

Options:

Dependencies:

Databases

Usage of tax2peptide with database download:

Usage of tax2peptide if all database files are already downloaded:

Authors

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes