A package for filtering candidate mutations for spontaneous mutation rate estimates.
Project description
Installation
CaMu.py
can easily be installed using pip. Just run the command:
pip install camu
pip will automatically install the required versions of the python packages that are included within CaMu.py
.
Additionally, it is necessary to have samtools >= 0.1.19 and gatk>=4.1.2 installed and added to the path.
Usage
After installing CaMu.py
, the main module camu
can be run via
python3 -m camu <additional parameter>
Giving -h or --help as additional parameter opens the help page.
Here you can see all additional parameters you need to pass in order to run camu
.
For the overall script it is necessary to provide a text file giving the paths all samples VCF files in column 1 and the corresponding BAM files in column 2 with -i.
Additionally, the path to the control BAM file has to be provided via -c.
Finally, the path to the reference genome has to be given with -r.
If you want to run any script separately, you can call the script using
python3 -m <scriptname>
where <scriptname> has to be exchanged by one of the 5 modules given below (preprocessing.py
, filterDupAndLinked.py
, etc.).
Filtering false-positive candidate mutations to accelerate DNM-counting for direct µ estimates
For direct estimation of the spontaneous mutation rate µ, it is necessary to calculate the rate of spontaneous de-novo mutations (DNM) occuring per site per generation. Consequently, counting DNM is essential for estimating µ.
The raw approach is:
- Sequencing samples and control --> FASTQ files
- Assembly of sequencing results --> BAM files
- perform some filtering steps
- Variant calling
- extraction of variants occurring in samples but not in control --> candidate mutations
The resulting list of candidate mutations (CM) currently has to be manually curated using a genome browser like IGV.
Unfortunately, approx. 90 % of these CM are no true DNM, they turn out to be false-positives.
CaMu.py aims to accelerate the whole procedure of DNM counting by filtering out the vast majority of false-positive CM and by preparing the remaining CM for fast manual curation with IGV.
CaMu.py consists of 5 main Python modules:
preprocessing.py
filterDupAndLinked.py
detectFIO.py
snapshotIGV.py
IGVsessions.py
preprocessing.py
starts with an input text file containing paths to all sampels' VCF files in column 1 and paths to all corresponding BAM files in column 2.
These VCF and BAM files are fread out nd further processed in order to find variants that are possible DNM - the candidate mutations CM.
The rough is approach is the following:
The following scripts within CaMu.py
further filter the CM for those that fully linked to other mutations, those that are only included due to reads that are most probably PCR duplicates and for those variants occurring in other samples or several times in the control sample's BAM file.
Finally, for all the remaining CM IGV Sessions and IGV snapshots are created within IGVsessions.py
and snapshotIGV.py
to further simplify the manual curation of the remaining CM.
Here is an overview:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file camu-0.1.7.tar.gz
.
File metadata
- Download URL: camu-0.1.7.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62983727ba60dab865244c9b304b3abb066f87acd8fe3f2cee93e9ff80c9a67f |
|
MD5 | f012077e80571960df06b47532e388df |
|
BLAKE2b-256 | 707659a63ff0e5d8508d4e8abc052e5674cb4ac48229c6f2a93805307f39c3f0 |
File details
Details for the file camu-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: camu-0.1.7-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82ace2d120fba317b789c725e69f1391b08640b924f620d6d0d5a99fdfd9650b |
|
MD5 | 1f24a2dac8396ff62ee747222afc0394 |
|
BLAKE2b-256 | 4c7cdfb8ba29e9a6e6ab0622d0c332cdb861ef8fe97a14f55314c37e0ed2616b |