Skip to main content

finds mutants in your scRNA-seq experiment

Project description

cerebra

image

Build Status Code Coverage

What is cerebra?

This tool allows you to quickly extract meaningful variant information from a DNA or RNA sequencing experiment. If you're interested in learning what mutations are present in your DNA/RNA samples, variant callers like GATK HaplotypeCaller can be used to generate variant calling format (.vcf) files following a sequencing experiment. However, a single sequencing run can generate on the order of 10^8 unique vcf entries, only a small portion of which contain meaningful biological signal. Thus drawing conclusions from .vcf files remains a substantial challange. cerebra provides a fast and intuitive framework for summarizing vcf entries across samples. It is comprised of four modules that do the following:

    1) remove germline mutations from samples of interest        
    2) count the total number of mutations in a given sample           
    3) report amino acid level SNPs and indels for each sample             
    4) report the ratio of total to variant reads to each mutation site      

cerebra gets its name from the eponymous X-men character, who had the ability to detect mutant individuals among the general public.

If you're working with tumor data, it might be a good idea to limit the mutational search space to only known cancer variants. Therefore cerebra implements an optional method for restricting to variants also found in the COSMIC database.

NOTE: this framework was developed for, but is certainly not limited to, single-cell RNA sequencing data.

  • Free software: MIT license

What makes cerebra different from traditional vcf parsers?

Python libraries exist (ie. PyVCF and vcfpy) for extracting information from vcf files, and GATK has its own tool for the task. In fact we integrate vcfpy into our tool. What makes cerebra different is that it reports the RNA transcript and amino acid change associated with each variant. GATK VariantsToTable produces a file that looks like:

CHROM    POS ID      QUAL    AC
 1        10  .       50      1
 1        20  rs10    99      10

Such a table contains only genomic (ie. DNA-level) coordinates. Often the next question is what specific gene and protein-level mutation is each variant associated with? cerebra queries a reference genome (.fa) and annotation (.gtf) to match each DNA-level variant with its associated gene, probable transcript and probable amino-acid level mutation. cerebra produces a table that looks like the following: alt text

cerebra adheres to HGVS sequence variant nomenclature in reporting peptide level variants

Installation

The latest version can be installed from PyPi:
pip install cerebra

Before running that you'll need to install a few dependencies.

For OSX:

sudo pip install setuptools
brew update
brew install openssl
brew install zlib

For Linux:

apt-get install libbz2-dev
apt-get install zlib1g-dev
apt-get install libssl-dev

Usage

cerebra should now be installed as a commandline executable. $ cerebra should return help information

Usage: cerebra  <command>

  high-throughput summarizing of vcf entries following a sequencing
  experiment

Options:
  -h, --help  Show this message and exit.

Commands:
  count-mutations    count total number of mutations in each sample
  find-aa-mutations  report amino-acid level SNPs and indels in each sample
  germline-filter    filter out common SNPs/indels between germline samples...

Features

count-mutations: count total number of mutations in each sample
find-aa-mutations: report amino-acid level SNPs and indels in each sample
germline-filter: filter out common SNPs/indels between germline samples and samples of interest

Authors

This work was produced by Lincoln Harris, Rohan Vanheusden, Olga Botvinnik and Spyros Darmanis of the Chan Zuckerberg Biohub. For questions please contact lincoln.harris@czbiohub.org

Contributing

We welcome any bug reports, feature requests or other contributions. Please submit a well documented report on our issue tracker. For substantial changes please fork this repo and submit a pull request for review.

Feel free to clone but NOTE this project is still a work in progress.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cerebra-1.0.6.tar.gz (30.0 kB view hashes)

Uploaded Source

Built Distribution

cerebra-1.0.6-py2.py3-none-any.whl (24.8 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page