Clinical sequencing panel CNV caller and visualizer
Project description
ClearCNV: Clinical sequencing panel CNV caller and visualizer
- Code Formatting: black
Installation
conda
ClearCNV is available on conda: https://anaconda.org/bioconda/clearcnv
I'd recommend to create a conda env:
mamba create -n clearcnv clearcnv -c conda-forge -c bioconda
or
conda create -n clearcnv clearcnv -c conda-forge -c bioconda
Then clone this repo to your favorite location git clone git@github.com:bihealth/clear-cnv.git
and cd clear-cnv
into it. Now you can run the commands listed below.
Quick run checks and examples
Sample reassignment:
Create all files
Execute the shell commamd (from within the cloned repo directory):
clearCNV workflow_reassignment --workdir tests/testdata/ --reference tests/testdata/test_reassignment_ref.fa --metafile tests/testdata/test_reassign_meta.tsv --coverages tests/testdata/test_reassignment_coverages.tsv --bedfile tests/testdata/test_reassignment_union.bed --cores 2
- INPUT: working directory given by
--workdir
, the files given by--reference
and--metafile
. - OUTPUT: files created at
--coverages
and--bedfile
. They are used in the next step.
If you want to create the necessary files for yourown data just edit the meta.tsv file analogously to the example at clearCNV/tests/testdata/meta.tsv
, where you can add more rows for each targets file (BED-file). It is recommended to use absolute paths in the meta file.
Optionally, drmaa can be used, if the two flags are present:
--drmaa_mem 1600 --drmaa_time 4:00
,
where drmaa is given 16 Gb memory per core and and four hours maximum running time.
Also, a cluster config file in .json format can be given with --cluster_configfile config.json
Visualize sample reassignment:
Visualize and adjust the clusterings and final panel assignments
Execute the shell commamd (from within the cloned repo directory):
clearCNV visualize_reassignment --metafile tests/testdata/meta.tsv --coverages tests/testdata/cov_reassignment.tsv --bedfile tests/testdata/reassignment_union.bed --new_panel_assignments_directory tests/testdata/panel_assignments
- INPUT: files given by
--metafile
,--coverages
and--bedfile
. - OUTPUT: files found in given directory
--new_panel_assignments_directory
.
CNV calling
Match scores
At first, match scores are claculated. Go to the directory clear-cnv/
and execute the shell command:
clearCNV matchscores -p testpanel -c tests/testdata/cov.tsv -m tests/testdata/matchscores.tsv
This creates a match score matrix which is used in the CNV calling step.
CNV calls
Now execute this shell command:
clearCNV cnv_calling -p testpanel -c tests/testdata/cov.tsv -a tests/testdata/testpanel/analysis -m tests/testdata/matchscores.tsv -C tests/testdata/testpanel/results/cnv_calls.tsv -r tests/testdata/testpanel/results/rscores.tsv -z tests/testdata/testpanel/results/zscores.tsv -g 15 -u 3
This creates the file tests/testdata/testpanel/results/cnv_calls.tsv
which shows one called deletion. if you copy & paste this for your own data, please don't use the -g 15 -u 3
configuration. We use these in here just to be able to work with a tiny example.
More files for analysis can now be found in tests/testdata/testpanel/analysis
.
HOW TO and WORKFLOW
clearCNV comprises of two major workflows and three major commads:
workflow
-
re-assignment
a)
clearCNV workflow_reassignment
b)
clearCNV visualize_reassignment
-
CNV calling
a)
clearCNV workflow_cnv_calling
preparations
Some files have to be acquired or created before these commands can be run:
-
re-assignment:
a) For each sequencing panel a .bed file is needed following this form. Such a file should always exist in the case of targeted sequencing.
b) For each sequencing panel (or .bed-file containing all target informations) a simple list of the according .bam files is needed. An example can be found here. Make sure to use absolute paths for this file on custim data.
c) meta-file. This file is a tab-separated file and one example can be found here. To avoid any confusion, we recommend using absolute paths here again.
-
CNV calling:
a) A reference file. It must be the same that was used to create the alignment files (.bam files).
b)
workflow_cnv_calling
does CNV calling for each batch (or sequencing panel associated data set) separately. A text file with all .bam file paths for each batch and panel must be created. Here is an example showing only one .bam file path. Multiple paths are separated with a newline. This file is usually an output ofclearCNV visualize_reassignment
.c) The .bed-file for the sequencing panel for which this batch is put to CNV calling. An example can be found here. Note that
gene
is optimally replaced with the real name of the exon, gene or target.d) A k-mer alignability file in .bed format. Such files can be downloaded from UCSC (e.g. for Hg19 here). A k-mer mappability track can also be created for example using GenMap. In both cases the resulting Wig or BigWig files need to be converted to .bed to be used by clearCNV.
notes
The chromosome name scheme in the reference and .bed-file should be of the forms: ChrX, chrX, X or Chr1, chr1, 1.
CNV calling on chr X or chr Y: clearCNV automatically determines the copy number of the gonosomes. If your panel targets only a single gene there, it is better to delete according targets from the original .bed file to exclude them. It is necessary to have about double as many samples in your data set to enable meaningful CNV calling on the X or Y chromosomes with roughly equally many women and men in the samples.
NOTE
If you do sample re-assignment on your own data, followed by CNV-calling, then only one metafile, one coverages file, and one bedfile will be used. This means that --metafile
, --coverages
and --bedfile
are given the same file paths in both workflow steps clearCNV workflow_reassignment
and clearCNV visualize_reassignment
of clearCNV.
Running Checks
Checks are automatically run on the master
branch and pull requests.
Unit and integration tests are based on pytest and formatting is enforced with black.
$ make test
History
v0.0.1
- Everything is new!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file clearCNV-0.306.tar.gz
.
File metadata
- Download URL: clearCNV-0.306.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad9fa46b6b09f2286455d912820aef074929cc4f99cf93a510937f38d41e45a3 |
|
MD5 | fd9faf4f24a765bb72076a4e408bef19 |
|
BLAKE2b-256 | 8b2e1944b2d45404dba382047e6e1c5550bed01176559a463432607d46a4117f |
File details
Details for the file clearCNV-0.306-py2.py3-none-any.whl
.
File metadata
- Download URL: clearCNV-0.306-py2.py3-none-any.whl
- Upload date:
- Size: 1.8 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 456e98e699cefe2ffc847c08bf784d1a25ee1b46064cc976f77a05ea99388994 |
|
MD5 | a4c8e3f48abb5f8cc7c47cd16d41695c |
|
BLAKE2b-256 | c6af81c81156747739493088d2e4b7f83bb6638a1795814ebe6b22e06f248ed9 |