Skip to main content

Utilities for analyzing mutations and neoepitopes in patient cohorts

Project description

|PyPI| |Build Status| |Coverage Status|

Cohorts
=======

Cohorts is a library for analyzing and plotting clinical data, mutations
and neoepitopes in patient cohorts.

It calls out to external libraries like
`topiary <https://github.com/hammerlab/topiary>`__ and caches the
results for easy manipulation.

Cohorts requires Python 3 (3.3+). We are no longer maintaining
compatability with Python 2. For context, see this `Python 3
statement <www.python3statement.org>`__.

Installation
------------

You can install Cohorts using
`pip <https://pip.pypa.io/en/latest/quickstart.html>`__:

.. code:: bash

pip install cohorts

Features
--------

- Data management: construct a ``Cohort`` consisting of ``Patient``\ s
with ``Sample``\ s.
- Use ``varcode`` and ``topiary`` to generate and cache variant effects
and predicted neoantigens.
- Provenance: track the state of the world (package and data versions)
for a given analysis.
- Aggregation functions: built-in functions such as
``missense_snv_count``, ``neoantigen_count``,
``expressed_neoantigen_count``; or create your own functions.
- Plotting: survival curves via ``lifelines``, response/no response
plots (with Mann-Whitney and Fisher’s Exact results), ROC curves.
Example: ``cohort.plot_survival(on=missense_snv_count, how="pfs")``.
- Filtering: filter collections of variants/effects/neoantigens by, for
example, variant statistics.
- Pre-define data sets to work with. Example:
``cohort.as_dataframe(join_with=["tcr", "pdl1"])``.

In addition, several other libraries make use of ``cohorts``: \*
`pygdc <http://github.com/hammerlab/pygdc>`__ \*
`query_tcga <http://github.com/jburos/query_tcga>`__

Quick Start
-----------

One way to get started using Cohorts is to use it to analyze TCGA data.

As an example, we can create a cohort using
`query_tcga <http://github.com/jburos/query_tcga>`__:

.. code:: python

from query_tcga import cohort, config

# provide authentication token
config.load_config('config.ini')

# load patient data
blca_patients = cohort.prep_patients(project_name='TCGA-BLCA',
project_data_dir='data')

# create cohort
blca_cohort = cohort.prep_cohort(patients=blca_patients,
cache_dir='data-cache')

Then, use ``plot_survival()`` to summarize a potential biomarker (e.g.
``snv_count``) by survival:.

.. code:: python

from cohorts.functions import snv_count
blca_cohort.plot_survival(snv_count, how='os', threshold='median')

Which should produce a summary of results including this plot:

.. figure:: /docs/survival_plot_example.png
:alt: Survival plot example

Survival plot example

We could alternatively use ``plot_benefit()`` to summarize OS>12mo
instead of survival:

.. code:: python

blca_cohort.plot_benefit(snv_count)

.. figure:: /docs/benefit_plot_example.png
:alt: Benefit plot example

Benefit plot example

See the full example in the `quick-start
notebook <http://nbviewer.jupyter.org/github/hammerlab/tcga-blca/blob/master/Quick-start%20-%20using%20Cohorts%20with%20TCGA%20data.ipynb>`__

Building from Scratch
---------------------

.. code:: python

patient_1 = Patient(
id="patient_1",
os=70,
pfs=24,
deceased=True,
progressed=True,
benefit=False
)

patient_2 = Patient(
id="patient_2",
os=100,
pfs=50,
deceased=False,
progressed=True,
benefit=False
)

cohort = Cohort(
patients=[patient_1, patient_2],
cache_dir="/where/cohorts/results/get/saved"
)

cohort.plot_survival(on="os")

.. code:: python

sample_1_tumor = Sample(
is_tumor=True,
bam_path_dna="/path/to/dna/bam",
bam_path_rna="/path/to/rna/bam"
)

patient_1 = Patient(
id="patient_1",
...
snv_vcf_paths=["/where/my/mutect/vcfs/live",
"/where/my/strelka/vcfs/live"]
indel_vcfs_paths=[...],
tumor_sample=sample_1_tumor,
...
)

cohort = Cohort(
...
patients=[patient_1]
)

.. |PyPI| image:: https://img.shields.io/pypi/v/cohorts.svg?maxAge=21600
:target:
.. |Build Status| image:: https://travis-ci.org/hammerlab/cohorts.svg?branch=master
:target: https://travis-ci.org/hammerlab/cohorts
.. |Coverage Status| image:: https://coveralls.io/repos/hammerlab/cohorts/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/hammerlab/cohorts?branch=master

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cohorts-0.7.3.tar.gz (75.2 kB view details)

Uploaded Source

File details

Details for the file cohorts-0.7.3.tar.gz.

File metadata

  • Download URL: cohorts-0.7.3.tar.gz
  • Upload date:
  • Size: 75.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cohorts-0.7.3.tar.gz
Algorithm Hash digest
SHA256 1bf0a4fb3fbaf670a2eb880a49e20395b5b33deac1f1f0ca1607f6643262b030
MD5 e80a670c45dfa275ae70891ba190de06
BLAKE2b-256 2718b6abe228b176b8ac0d8b4dd13b2a62eaa4740f3af5e1d7d96edd0a4ab096

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page