pepdata

Python interface to IEDB and other immune epitope data

These details have been verified by PyPI

Maintainers

hammerlab iskander openvax tavinathanson timodonnell

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

An important aspect of computational immunology is modeling the properties of peptides (short strings of amino acids). Peptides can arise as substrings cut out of a larger protein, naturally occurring small proteins, or be synthesized for therapeutic purposes. To make useful clinical and research predictions (i.e. “which peptides should go in this vaccine?”) we need to partition the combinatorial space of peptides into classes such as T-cell epitopes or MHC ligands. One way to capture such distinctions is to collect large volumes of data about peptides and use that data to build statistical models of their immune properties. This library helps you build such models by providing simple Python/NumPy/Pandas interfaces to commonly used immunology and bioinformatics datasets.

Data Sources

iedb: Immune Epitope Database, large collection of epitope assay results for MHC binding as well as T-cell/B-cell responses
tcga: Variant peptide substrings extracted from TCGA mutations across all cancer types
reference: Peptide substrings from the human reference protein sequence
imma2: IMMA2 epitope immunogenic vs. non-immunogenic data set used by Tung et al. for evaluating the POPISK immunogenicity predictor
calis: Two datasets used in Calis et al.’s Properties of MHC Class I Presented Peptides That Enhance Immunogenicity
hpv: Human Papillomavirus T cell Antigen Database
toxin: Toxic protein sequences from Animal Toxin Databse
danafarber: Dana Farber Repository for Machine Learning in Immunology
tantigen: Tumor T-cell Antigen Database
hiv_frahm: Reactions to HIV epitopes across different ethnicities (from LANL HIV Databases)
cri_tumor_antigens: Tumor associated peptides from Cancer Immunity
fritsch_neoepitopes: Mutated and wildtype tumor epitopes from Fritsch et al. HLA-binding properties of tumor neoepitopes in humans

Planned:

bcipep: B-cell epitopes

Dataset API

When a dataset consists only of an unlabeled list of epitopes, then it only needs two functions: - load_wuzzle: Returns set of amino acid strings - load_wuzzle_ngrams: Array whose rows are amino acids transformed into n-gram vector space.

If the dataset contains additional data about the epitopes (such as HLA type u or source protein): - load_wuzzle: Returns data frame with epitope strings and additional properties - load_wuzzle_set: Set of epitope amino acid strings - load_wuzzle_ngrams: Array whose rows are amino acids transformed into n-gram vector space.

If the dataset is labeled (contains positive and negative assay results), then the following functions should be available: - load_wuzzle: Load all available data from the “wuzzle” dataset (filtered by options such as mhc_class). - load_wuzzle_values: Group the dataset by epitope string and associate each epitope with the positive and negative counts, along with percentage of positive results (in a column called “value”). - load_wuzzle_classes: Split the epitopes into positive and negative classes, return a set of strings for each. - load_wuzzle_ngrams: Transform the amino acid string representation (or some reduced alphabet) into vectors of n-gram frequencies, return a sklearn-compatible (samples, labels) pair of arrays.

Amino Acid Properties

The amino_acid module contains a variety of physical/chemical properties for both single amino residues and interactions between pairs of residues.

Single residue feature tables are parsed into StringTransformer objects, which can be treated as dictionaries or will vectorize a string when you call their method transform_string.

Examples of single residue features: - hydropathy - volume - polarity - pK_side_chain - prct_exposed_residues - hydrophilicity - accessible_surface_area - refractivity - local_flexibility - accessible_surface_area_folded - alpha_helix_score (Chou-Fasman) - beta_sheet_score (Chou-Fasman) - turn_score (Chou-Fasman)

Pairwise interaction tables are parsed into nested dictionaries, so that the interaction between amino acids x and y can be determined from d[x][y].

Pairwise interaction dictionaries: - strand_vs_coil (and its transpose coil_vs_strand) - helix_vs_strand (and its transpose strand_vs_helix) - helix_vs_coil (and its transpose coil_vs_helix) - blosum30 - blosum50 - blosum62

There is also a function to parse the coefficients of the PMBEC similarity matrix, though this currently lives in the separate pmbec module.

Project details

These details have been verified by PyPI

Maintainers

hammerlab iskander openvax tavinathanson timodonnell

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

1.0.7

Feb 26, 2018

1.0.4

Feb 24, 2018

1.0.2

Feb 21, 2018

0.7.0

Jan 9, 2017

0.6.8

Sep 1, 2016

0.6.7

Nov 3, 2015

0.6.6

Nov 3, 2015

0.6.5

Nov 3, 2015

0.6.4

Jun 2, 2015

0.6.3

May 4, 2015

0.6.2

May 4, 2015

0.6.1

May 4, 2015

This version

0.6.0

Mar 18, 2015

0.5.0

Mar 12, 2015

0.4.1

Feb 25, 2015

0.4.0

Oct 30, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepdata-0.6.0.tar.gz (23.0 kB view details)

Uploaded Mar 18, 2015 Source

File details

Details for the file pepdata-0.6.0.tar.gz.

File metadata

Download URL: pepdata-0.6.0.tar.gz
Upload date: Mar 18, 2015
Size: 23.0 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for pepdata-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`c3ce7182c40104183bc2867c63561e25b1aa9f2128a2bd4fc996069880304dbd`
MD5	`790d1e609708e75fa08dc0d98c10a57b`
BLAKE2b-256	`7b729b5ee89abe8b0b85541914b32d9d89886dbc85ade69615d8320d1b5e9308`