Skip to main content

A package for filtering candidate mutations for spontaneous mutation rate estimates.

Project description

Camu Logo

Installation

CaMu.py can easily be installed using pip. Just run the command:

pip install camu

pip will automatically install the required versions of the python packages that are included within CaMu.py.

Additionally, it is necessary to have samtools >= 0.1.19 and gatk>=4.1.2 installed and added to the path.

Usage

After installing CaMu.py, the main module camu can be run via

python3 -m camu <additional parameter>

Giving -h or --help as additional parameter opens the help page.

Here you can see all additional parameters you need to pass in order to run camu.

For the overall script it is necessary to provide a text file giving the paths all samples VCF files in column 1 and the corresponding BAM files in column 2 with -i.

Additionally, the path to the control BAM file has to be provided via -c.

Finally, the path to the reference genome has to be given with -r.

If you want to run any script separately, you can call the script using

python3 -m <scriptname>

where <scriptname> has to be exchanged by one of the 5 modules given below (preprocessing.py, filterDupAndLinked.py, etc.).

Filtering false-positive candidate mutations to accelerate DNM-counting for direct µ estimates

For direct estimation of the spontaneous mutation rate µ, it is necessary to calculate the rate of spontaneous de-novo mutations (DNM) occuring per site per generation. Consequently, counting DNM is essential for estimating µ.

The raw approach is:

  • Sequencing samples and control --> FASTQ files
  • Assembly of sequencing results --> BAM files
  • perform some filtering steps
  • Variant calling
  • extraction of variants occurring in samples but not in control --> candidate mutations

The resulting list of candidate mutations (CM) currently has to be manually curated using a genome browser like IGV.

Unfortunately, approx. 90 % of these CM are no true DNM, they turn out to be false-positives.

CaMu.py aims to accelerate the whole procedure of DNM counting by filtering out the vast majority of false-positive CM and by preparing the remaining CM for fast manual curation with IGV.

CaMu.py consists of 5 main Python modules:

  1. preprocessing.py
  2. filterDupAndLinked.py
  3. detectFIO.py
  4. snapshotIGV.py
  5. IGVsessions.py

preprocessing.py starts with an input text file containing paths to all sampels' VCF files in column 1 and paths to all corresponding BAM files in column 2. These VCF and BAM files are fread out nd further processed in order to find variants that are possible DNM - the candidate mutations CM.

The rough is approach is the following:

Preprocessing pipeline

The following scripts within CaMu.py further filter the CM for those that fully linked to other mutations, those that are only included due to reads that are most probably PCR duplicates and for those variants occurring in other samples or several times in the control sample's BAM file.

Finally, for all the remaining CM IGV Sessions and IGV snapshots are created within IGVsessions.py and snapshotIGV.py to further simplify the manual curation of the remaining CM.

Here is an overview:

CaMu overview

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

camu-0.1.6.tar.gz (15.7 kB view hashes)

Uploaded Source

Built Distribution

camu-0.1.6-py3-none-any.whl (18.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page