Skip to main content

PhIP-seq analysis library

Project description

Build Status

phipkit

PhIP-seq data analysis toolkit

This package implements a data analysis workflow for processing the results of phage immunoprecipitation sequencing (PhIP-Seq) experiments.

This workflow gets you from a matrix of read counts to a list of potential epitopes for each sample. Downstream analyses (not implemented here) may use these epitopes in case vs. control comparisons or other analyses appropriate to the study.

Installation

All dependencies except BLAST (required by phipkit-blast command) can be installed by pip:

$ pip install phipkit

or from a checkout of the repo:

$ pip install .

In order to run the phipkit-blast command, you will need the BLAST commands blastp and makeblastdb in your PATH. I recommend installing BLAST using bioconda. After setting up bioconda, you can just run:

conda install blast

This package has been tested using BLAST versions 2.6.0 and 2.10.1. Most likely any recent version will work.

You can run the phipkit unit tests with:

$ pip install pytest
$ pytest

Assumptions and data requirements

Key assumptions for our analysis approach:

  • Your PhIP-seq run included several (ideally 5 or more) "beads-only" negative control wells, i.e. wells without any antibody added.
  • Your PhIP-seq library was designed by tiling source proteins with overlapping peptides, i.e. each position in the source protein is covered by more than one phage clone

To use this package, you will need three things:

  • A library annotation file (CSV) giving the amino acid insert sequence (peptide) for each clone
  • A reference proteome (FASTA) giving the amino acid sequences of the full-length proteins that the library covers
  • A sample x clones matrix (CSV) of read counts for your experiment. This can be generated for example using the phip-stat package.

Note: in most cases where a CSV file is expected, you can also also pass a tab-separated file (TSV), or a bzip2-compressed CSV or TSV.

Example analysis

We will walk through analyzing a toy synthetic PhIP-seq dataset. First go into the example-data directory:

$ cd example-data

Step 1. Blast PhIP-seq library against itself

You only need to do this once for each PhIP-seq library you run. If you mix libraries in a single run, I recommend running this command on the combined mixture of libraries.

To blast phage clone sequences against themselves, run:

$ phipkit-blast annotations.csv --out blast.all_against_all.csv

Example results: blast.all_against_all.csv.

The first argument is your library annotation file. This is expected to have phage clone names on the first column, and another column named seq_aa that gives the amino acid sequence of the peptide. This column name can be changed; see phipkit-blast -h.

Running this on a real library can be quite slow. I usually run it overnight. Passing '--blast-parameter num_threads 4' can speed this up (the default is 1 thread).

Step 2. Blast PhIP-seq library against reference proteome

There's a second blast step that you also need to do only once per library. Here we blast the library against a set of full length protein sequences (referred to as "antigens" later).

$ phipkit-blast annotations.csv --reference proteins.fasta --out blast.reference.csv

Example results: blast.reference.csv.

The proteins you use here should roughly correspond to the proteins used originally to generate the PhIP-seq library. They don't have to exactly match though. For example, for an autoantigen PhIP-seq library, it's reasonable to use the current human proteome from Uniprot.

Step 3. Calculate scores

This step is independent of the two BLAST steps. It calculates "scores" that capture enrichment over beads-only negative controls. The results are conceptually analogous to the z-scores introduced in Yuan et al 2018, although a different model is used based on a robust fit of a Poisson distribution. See score.py for the implementation.

$ phipkit-score counts.csv --out scores.csv

Example results: scores.csv.

Step 4. Call hits

This step identifies phage clones (actually pairs of phage clones with overlapping sequences) that have high enough scores to be called hits.

Our method for hit calling uses only the relative ranking of the scores, not the values themselves. You can therefore use any alternate scoring method you like, as long as clones with higher read counts have higher scores. The method is implemented in call_hits.py.

$ phipkit-call-hits blast.all_against_all.csv scores.csv --counts counts.csv --fdr 0.1 --out hits.csv

Example results: hits.csv.

Step 5. Call antigens

This step looks for regions of source proteins that have multiple overlapping hits. It's implemented in call_antigens.py.

$ phipkit-call-antigens blast.reference.csv hits.csv --out antigens.csv

Example results: antigens.csv.

Step 6 (optional). Plot antigen calls

Some rudimentary plotting support is also available.

$ phipkit-plot-antigens blast.reference.csv hits.csv antigens.csv --out antigens.pdf

Example results: antigens.pdf.

Contributing

PRs with bugfixes or new functionality are welcome. Before embarking on a major change, please file an issue.

To push a new release to PyPI:

  • Make sure the package version specified in __init__.py is a new version greater than what's on PyPI.
  • Tag a new release on GitHub matching this version

Travis should deploy the release to PyPI automatically.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phipkit-0.0.1.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phipkit-0.0.1-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file phipkit-0.0.1.tar.gz.

File metadata

  • Download URL: phipkit-0.0.1.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for phipkit-0.0.1.tar.gz
Algorithm Hash digest
SHA256 eab51a010fd22cab1ee8976d6e946af34bf294bd8ccafc6690efee37820adcf1
MD5 06846d3997e3cf92c0d11c1a50d2de8c
BLAKE2b-256 57e068014f1e52fdc74ac701322376baf767b22383f72f6a30c9519aa4c81bdd

See more details on using hashes here.

File details

Details for the file phipkit-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: phipkit-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for phipkit-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8dde65b208ff897f93183cda0c13fe309dca493a733397494f63b71ed0c0b59e
MD5 a931eb8c41095c14f4e4a628f90c2058
BLAKE2b-256 f1a5bcdd8ab493926e155e749bfa6bb6be9491194b1c851f2d1a5dc1f4d54ae4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page