Skip to main content

A package for creating variant protein databases for bacteria from Pool-seq experiments

Project description

PoolSeqProGen : a python package for Pool-seq driven proteogenomics protein database creation in bacteria

This python package creates a proteogenomic database from Pooled sequencing experiments by interrogating sorted bam alignment files to include more than one version(due to mutations) of a tryptic peptide. This way it reduces the database size since only those variant sequences containing mutations in a certain order as seen in the alignment file and not all combinations of variant sequences are used. In addition to the tryptic peptides containing mutations, their 'wild type' protein sequences are written to the database. Also the proteins that do not contain mutations are recorded.

Requirements:

It uses pysam,pyteomics and biopython packages.

IF you use this package, please cite:

Weldatsadik RG , Datta N, Kolmeder C, Vuopio J , Kere J , Wilkman SV et al. Pool-seq driven proteogenomic database for Group G Streptococcus . Journal of Proteomics . 2019 June 15; 201: 84-92. https://doi.org/10.1016/j.jprot.2019.04.015

Installation : pip install PoolSeqProGen

usage: generate_variants  [-h] --genbankFile GENBANKFILE
                            --bamFile BAMFILE --SnpEffTextOutputFile
                            SNPEFFTEXTOUTPUTFILE
                            [--geneticCodeID GENETICCODEID]
                            [--fastaFile FASTAFILE] [--poolID POOLID]


optional arguments:
  -h, --help            show this help message and exit

  --genbankFile GENBANKFILE
                        the path to genbank file for the reference genome
  --bamFile BAMFILE     the path to the sorted alignment bam file for
                        retrieving reads from
  --SnpEffTextOutputFile SNPEFFTEXTOUTPUTFILE
                        the path to the text output file from SnpEff. Make sure the
                        chromosome is the same as that found in the bam file.
  --geneticCodeID GENETICCODEID
                        the genetic code id of your species https: //
                        www.ncbi.nlm.nih.gov / Taxonomy / Utils / wprintgc.cgi
  --fastaFile FASTAFILE
                        the path to the fasta output file
  --poolID POOLID       
			Qualifiers to add to the fasta header

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PoolSeqProGen-0.0.2.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

PoolSeqProGen-0.0.2-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file PoolSeqProGen-0.0.2.tar.gz.

File metadata

  • Download URL: PoolSeqProGen-0.0.2.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3

File hashes

Hashes for PoolSeqProGen-0.0.2.tar.gz
Algorithm Hash digest
SHA256 46d63ff7a364dd57825cebc850502a3090d407257dafa09a943e4bf699d5be5d
MD5 6b01b446742cb4c34ddbb799a593e286
BLAKE2b-256 2881efa4524c3353cfbbf777b4aad2a932eda9b8213b71896051a0c880752c60

See more details on using hashes here.

File details

Details for the file PoolSeqProGen-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: PoolSeqProGen-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3

File hashes

Hashes for PoolSeqProGen-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 70b0caf7b39e48a91917bf199705fe3ab5d81acf8a88458428de6e41e617c3b0
MD5 d6343cff0e9aed923b5045d93463b45a
BLAKE2b-256 3736215f351e805c5d4cf962e0fe9cf484185bba2ae3c84ab7e3fa99ad14f567

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page