Utilities for analyzing mutations and neoepitopes in patient cohorts
Project description
|PyPI| |Build Status| |Coverage Status|
Cohorts
=======
Cohorts is a library for analyzing and plotting clinical data, mutations
and neoepitopes in patient cohorts.
It calls out to external libraries like
`topiary <https://github.com/hammerlab/topiary>`__ and caches the
results for easy manipulation.
Cohorts requires Python 3 (3.3+). We are no longer maintaining
compatability with Python 2. For context, see this `Python 3
statement <www.python3statement.org>`__.
Installation
------------
You can install Cohorts using
`pip <https://pip.pypa.io/en/latest/quickstart.html>`__:
.. code:: bash
pip install cohorts
Features
--------
- Data management: construct a ``Cohort`` consisting of ``Patient``\ s
with ``Sample``\ s.
- Use ``varcode`` and ``topiary`` to generate and cache variant effects
and predicted neoantigens.
- Provenance: track the state of the world (package and data versions)
for a given analysis.
- Aggregation functions: built-in functions such as
``missense_snv_count``, ``neoantigen_count``,
``expressed_neoantigen_count``; or create your own functions.
- Plotting: survival curves via ``lifelines``, response/no response
plots (with Mann-Whitney and Fisher's Exact results), ROC curves.
Example: ``cohort.plot_survival(on=missense_snv_count, how="pfs")``.
- Filtering: filter collections of variants/effects/neoantigens by, for
example, variant statistics.
- Pre-define data sets to work with. Example:
``cohort.as_dataframe(join_with=["tcr", "pdl1"])``.
In addition, several other libraries make use of ``cohorts``: \*
`pygdc <http://github.com/hammerlab/pygdc>`__ \*
`query\_tcga <http://github.com/jburos/query_tcga>`__
Quick Start
-----------
One way to get started using Cohorts is to use it to analyze TCGA data.
As an example, we can create a cohort using
`query\_tcga <http://github.com/jburos/query_tcga>`__:
.. code:: python
from query_tcga import cohort, config
# provide authentication token
config.load_config('config.ini')
# load patient data
blca_patients = cohort.prep_patients(project_name='TCGA-BLCA',
project_data_dir='data')
# create cohort
blca_cohort = cohort.prep_cohort(patients=blca_patients,
cache_dir='data-cache')
Then, use ``plot_survival()`` to summarize a potential biomarker (e.g.
``snv_count``) by survival:.
.. code:: python
from cohorts.functions import snv_count
blca_cohort.plot_survival(snv_count, how='os', threshold='median')
Which should produce a summary of results including this plot:
.. figure:: /docs/survival_plot_example.png
:alt: Survival plot example
Survival plot example
We could alternatively use ``plot_benefit()`` to summarize OS>12mo
instead of survival:
.. code:: python
blca_cohort.plot_benefit(snv_count)
.. figure:: /docs/benefit_plot_example.png
:alt: Benefit plot example
Benefit plot example
See the full example in the `quick-start
notebook <http://nbviewer.jupyter.org/github/hammerlab/tcga-blca/blob/master/Quick-start%20-%20using%20Cohorts%20with%20TCGA%20data.ipynb>`__
Building from Scratch
---------------------
.. code:: python
patient_1 = Patient(
id="patient_1",
os=70,
pfs=24,
deceased=True,
progressed=True,
benefit=False
)
patient_2 = Patient(
id="patient_2",
os=100,
pfs=50,
deceased=False,
progressed=True,
benefit=False
)
cohort = Cohort(
patients=[patient_1, patient_2],
cache_dir="/where/cohorts/results/get/saved"
)
cohort.plot_survival(on="os")
.. code:: python
sample_1_tumor = Sample(
is_tumor=True,
bam_path_dna="/path/to/dna/bam",
bam_path_rna="/path/to/rna/bam"
)
patient_1 = Patient(
id="patient_1",
...
snv_vcf_paths=["/where/my/mutect/vcfs/live",
"/where/my/strelka/vcfs/live"]
indel_vcfs_paths=[...],
tumor_sample=sample_1_tumor,
...
)
cohort = Cohort(
...
patients=[patient_1]
)
.. |PyPI| image:: https://img.shields.io/pypi/v/cohorts.svg?maxAge=21600
:target:
.. |Build Status| image:: https://travis-ci.org/hammerlab/cohorts.svg?branch=master
:target: https://travis-ci.org/hammerlab/cohorts
.. |Coverage Status| image:: https://coveralls.io/repos/hammerlab/cohorts/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/hammerlab/cohorts?branch=master
Cohorts
=======
Cohorts is a library for analyzing and plotting clinical data, mutations
and neoepitopes in patient cohorts.
It calls out to external libraries like
`topiary <https://github.com/hammerlab/topiary>`__ and caches the
results for easy manipulation.
Cohorts requires Python 3 (3.3+). We are no longer maintaining
compatability with Python 2. For context, see this `Python 3
statement <www.python3statement.org>`__.
Installation
------------
You can install Cohorts using
`pip <https://pip.pypa.io/en/latest/quickstart.html>`__:
.. code:: bash
pip install cohorts
Features
--------
- Data management: construct a ``Cohort`` consisting of ``Patient``\ s
with ``Sample``\ s.
- Use ``varcode`` and ``topiary`` to generate and cache variant effects
and predicted neoantigens.
- Provenance: track the state of the world (package and data versions)
for a given analysis.
- Aggregation functions: built-in functions such as
``missense_snv_count``, ``neoantigen_count``,
``expressed_neoantigen_count``; or create your own functions.
- Plotting: survival curves via ``lifelines``, response/no response
plots (with Mann-Whitney and Fisher's Exact results), ROC curves.
Example: ``cohort.plot_survival(on=missense_snv_count, how="pfs")``.
- Filtering: filter collections of variants/effects/neoantigens by, for
example, variant statistics.
- Pre-define data sets to work with. Example:
``cohort.as_dataframe(join_with=["tcr", "pdl1"])``.
In addition, several other libraries make use of ``cohorts``: \*
`pygdc <http://github.com/hammerlab/pygdc>`__ \*
`query\_tcga <http://github.com/jburos/query_tcga>`__
Quick Start
-----------
One way to get started using Cohorts is to use it to analyze TCGA data.
As an example, we can create a cohort using
`query\_tcga <http://github.com/jburos/query_tcga>`__:
.. code:: python
from query_tcga import cohort, config
# provide authentication token
config.load_config('config.ini')
# load patient data
blca_patients = cohort.prep_patients(project_name='TCGA-BLCA',
project_data_dir='data')
# create cohort
blca_cohort = cohort.prep_cohort(patients=blca_patients,
cache_dir='data-cache')
Then, use ``plot_survival()`` to summarize a potential biomarker (e.g.
``snv_count``) by survival:.
.. code:: python
from cohorts.functions import snv_count
blca_cohort.plot_survival(snv_count, how='os', threshold='median')
Which should produce a summary of results including this plot:
.. figure:: /docs/survival_plot_example.png
:alt: Survival plot example
Survival plot example
We could alternatively use ``plot_benefit()`` to summarize OS>12mo
instead of survival:
.. code:: python
blca_cohort.plot_benefit(snv_count)
.. figure:: /docs/benefit_plot_example.png
:alt: Benefit plot example
Benefit plot example
See the full example in the `quick-start
notebook <http://nbviewer.jupyter.org/github/hammerlab/tcga-blca/blob/master/Quick-start%20-%20using%20Cohorts%20with%20TCGA%20data.ipynb>`__
Building from Scratch
---------------------
.. code:: python
patient_1 = Patient(
id="patient_1",
os=70,
pfs=24,
deceased=True,
progressed=True,
benefit=False
)
patient_2 = Patient(
id="patient_2",
os=100,
pfs=50,
deceased=False,
progressed=True,
benefit=False
)
cohort = Cohort(
patients=[patient_1, patient_2],
cache_dir="/where/cohorts/results/get/saved"
)
cohort.plot_survival(on="os")
.. code:: python
sample_1_tumor = Sample(
is_tumor=True,
bam_path_dna="/path/to/dna/bam",
bam_path_rna="/path/to/rna/bam"
)
patient_1 = Patient(
id="patient_1",
...
snv_vcf_paths=["/where/my/mutect/vcfs/live",
"/where/my/strelka/vcfs/live"]
indel_vcfs_paths=[...],
tumor_sample=sample_1_tumor,
...
)
cohort = Cohort(
...
patients=[patient_1]
)
.. |PyPI| image:: https://img.shields.io/pypi/v/cohorts.svg?maxAge=21600
:target:
.. |Build Status| image:: https://travis-ci.org/hammerlab/cohorts.svg?branch=master
:target: https://travis-ci.org/hammerlab/cohorts
.. |Coverage Status| image:: https://coveralls.io/repos/hammerlab/cohorts/badge.svg?branch=master&service=github
:target: https://coveralls.io/github/hammerlab/cohorts?branch=master
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cohorts-0.7.0.tar.gz
(74.6 kB
view details)
File details
Details for the file cohorts-0.7.0.tar.gz
.
File metadata
- Download URL: cohorts-0.7.0.tar.gz
- Upload date:
- Size: 74.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4335fbdddb7eb3ea151aff9df65d6cdac762fd691ea5b514cd1c3064d34006c7 |
|
MD5 | 37e46bc1f8dde57757ec00000f42db3f |
|
BLAKE2b-256 | 4db7d8697926196052d5d561c3c2d6f063616b478f030b0f53429107397a0038 |