A package for creating variant protein databases for bacteria from Pool-seq experiments
Project description
PoolSeqProGen : a python package for Pool-seq driven proteogenomics protein database creation in bacteria
This python package creates a proteogenomic database from Pooled sequencing experiments by interrogating sorted bam alignment files to include more than one version(due to mutations) of a tryptic peptide. This way it reduces the database size since only those variant sequences containing mutations in a certain order as seen in the alignment file and not all combinations of variant sequences are used. In addition to the tryptic peptides containing mutations, their 'wild type' protein sequences are written to the database. Also the proteins that do not contain mutations are recorded.
Requirements:
It uses pysam,pyteomics and biopython packages.
IF you use this package, please cite:
Weldatsadik RG , Datta N, Kolmeder C, Vuopio J , Kere J , Wilkman SV et al. Pool-seq driven proteogenomic database for Group G Streptococcus . Journal of Proteomics . 2019 June 15; 201: 84-92. https://doi.org/10.1016/j.jprot.2019.04.015
Installation : pip install PoolSeqProGen
usage: generate_variants [-h] --genbankFile GENBANKFILE
--bamFile BAMFILE --SnpEffTextOutputFile
SNPEFFTEXTOUTPUTFILE
[--geneticCodeID GENETICCODEID]
[--fastaFile FASTAFILE] [--poolID POOLID]
optional arguments:
-h, --help show this help message and exit
--genbankFile GENBANKFILE
the path to genbank file for the reference genome
--bamFile BAMFILE the path to the sorted alignment bam file for
retrieving reads from
--SnpEffTextOutputFile SNPEFFTEXTOUTPUTFILE
the path to the text output file from SnpEff. Make sure the
chromosome is the same as that found in the bam file.
--geneticCodeID GENETICCODEID
the genetic code id of your species https: //
www.ncbi.nlm.nih.gov / Taxonomy / Utils / wprintgc.cgi
--fastaFile FASTAFILE
the path to the fasta output file
--poolID POOLID
Qualifiers to add to the fasta header
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file PoolSeqProGen-0.0.2.tar.gz
.
File metadata
- Download URL: PoolSeqProGen-0.0.2.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46d63ff7a364dd57825cebc850502a3090d407257dafa09a943e4bf699d5be5d |
|
MD5 | 6b01b446742cb4c34ddbb799a593e286 |
|
BLAKE2b-256 | 2881efa4524c3353cfbbf777b4aad2a932eda9b8213b71896051a0c880752c60 |
File details
Details for the file PoolSeqProGen-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: PoolSeqProGen-0.0.2-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70b0caf7b39e48a91917bf199705fe3ab5d81acf8a88458428de6e41e617c3b0 |
|
MD5 | d6343cff0e9aed923b5045d93463b45a |
|
BLAKE2b-256 | 3736215f351e805c5d4cf962e0fe9cf484185bba2ae3c84ab7e3fa99ad14f567 |