Python tools for proteogenomics
Project description
Python tools for ProteoGenomics Analysis Toolkit
pypgatk is a Python library part of the ProteoGenomics Analysis Toolkit. It provides different bioinformatics tools for proteogenomics data analysis.
Requirements:
This package requirements vary depending on the way that you want to install it (all three are independent, you don't need all these requirements):
- pip: if installation goes through pip, you will require Python3 and pip3 installed.
- Bioconda: if installation goes through Bioconda, you will require that conda is installed and configured to use bioconda channels.
- Docker container: to use pypgatk from its docker container you will need Docker installed.
- Source code: to use and install from the source code directly, you will need to have git, Python3 and pip.
Installation
pip
You can install pypgatk with pip:
pip install pypgatk
Bioconda
You can install pypgatk with bioconda (please setup conda and the bioconda channel if you haven't first, as explained here):
conda install pypgatk
Available as a container
You can use the pypgatk tool already setup on a Docker container. You need to choose from the available tags here and replace it in the call below where it says <tag>
.
docker pull quay.io/biocontainers/pypgatk:<tag>
NOTE: Please note that Biocontainers containers do not have a latest tag, as such a docker pull/run without defining the tag will fail. For instance, a valid call would be (for version 0.0.2):
docker run -it quay.io/biocontainers/pypgatk:0.0.2--py_0
Inside the container, you can either use the Python interactive shell or the command line version (see below).
Use latest source code
Alternatively, for the latest version, clone this repo and go into its directory, then execute pip3 install .
:
git clone https://github.com/bigbio/py-pgatk
cd py-pgatk
# you might want to create a virtualenv for pypgatk before installing
pip3 install .
Usage
The pypgatk design combines multiple modules and tools into one framework. All the possible commands are accessible using the commandline tool pypgatk_cli.py
.
$: pypgatk_cli.py -h
Usage: pypgatk [OPTIONS] COMMAND [ARGS]...
This is the main tool that give access to all commands and options
provided by the pypgatk
Options:
-h, --help Show this message and exit.
Commands:
cbioportal-downloader Command to download the the cbioportal studies
cbioportal-to-proteindb Command to translate cbioportal mutation data into
proteindb
cosmic-downloader Command to download the cosmic mutation database
cosmic-to-proteindb Command to translate Cosmic mutation data into
proteindb
dnaseq-to-proteindb Generate peptides based on DNA sequences
ensembl-downloader Command to download the ensembl information
generate-decoy Create decoy protein sequences. Each protein is
reversed and the cleavage sites switched with
preceding amino acid. Peptides are checked for
existence in target sequences if foundthe tool will
attempt to shuffle them. James.Wright@sanger.ac.uk
2015
threeframe-translation Command to perform 3frame translation
vcf-to-proteindb Generate peptides based on DNA variants from
ENSEMBL VEP VCF files
The library provides multiple commands to download, translate and generate protein sequence databases from reference and mutation genome databases.
Full Documentation
https://pgatk.readthedocs.io/en/latest/pypgatk.html
Cite as
Husen M Umer, Enrique Audain, Yafeng Zhu, Julianus Pfeuffer, Timo Sachsenberg, Janne Lehtiö, Rui M Branca, Yasset Perez-Riverol, Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides, Bioinformatics, Volume 38, Issue 5, 1 March 2022, Pages 1470–1472, https://doi.org/10.1093/bioinformatics/btab838
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.