Skip to main content

Clinical sequencing panel CNV caller and visualizer

Project description

Continuous Integration Status Codacy Badge Codacy Badge MIT License

ClearCNV: Clinical sequencing panel CNV caller and visualizer

Installation

conda

ClearCNV is available on conda: https://anaconda.org/bioconda/clearcnv

I'd recommend to create a conda env:

mamba create -n clearcnv clearcnv -c conda-forge -c bioconda

or

conda create -n clearcnv clearcnv -c conda-forge -c bioconda

Then clone this repo to your favorite location git clone git@github.com:bihealth/clear-cnv.git and cd clear-cnv into it. Now you can run the commands listed below.

Quick run checks and examples

Sample reassignment:

Create all files

Execute the shell commamd (from within the cloned repo directory): clearCNV workflow_reassignment --workdir tests/testdata/ --reference tests/testdata/test_reassignment_ref.fa --metafile tests/testdata/test_reassign_meta.tsv --coverages tests/testdata/test_reassignment_coverages.tsv --bedfile tests/testdata/test_reassignment_union.bed --cores 2

  • INPUT: working directory given by --workdir, the files given by --reference and --metafile.
  • OUTPUT: files created at --coverages and --bedfile. They are used in the next step.

If you want to create the necessary files for yourown data just edit the meta.tsv file analogously to the example at clearCNV/tests/testdata/meta.tsv, where you can add more rows for each targets file (BED-file). It is recommended to use absolute paths in the meta file.

Optionally, drmaa can be used, if the two flags are present: --drmaa_mem 1600 --drmaa_time 4:00, where drmaa is given 16 Gb memory per core and and four hours maximum running time. Also, a cluster config file in .json format can be given with --cluster_configfile config.json

Visualize sample reassignment:

Visualize and adjust the clusterings and final panel assignments

Execute the shell commamd (from within the cloned repo directory): clearCNV visualize_reassignment --metafile tests/testdata/meta.tsv --coverages tests/testdata/cov_reassignment.tsv --bedfile tests/testdata/reassignment_union.bed --new_panel_assignments_directory tests/testdata/panel_assignments

  • INPUT: files given by --metafile, --coverages and --bedfile.
  • OUTPUT: files found in given directory --new_panel_assignments_directory.

CNV calling

Match scores

At first, match scores are claculated. Go to the directory clear-cnv/ and execute the shell command:

clearCNV matchscores -p testpanel -c tests/testdata/cov.tsv -m tests/testdata/matchscores.tsv

This creates a match score matrix which is used in the CNV calling step.

CNV calls

Now execute this shell command:

clearCNV cnv_calling -p testpanel -c tests/testdata/cov.tsv -a tests/testdata/testpanel/analysis -m tests/testdata/matchscores.tsv -C tests/testdata/testpanel/results/cnv_calls.tsv -r tests/testdata/testpanel/results/rscores.tsv -z tests/testdata/testpanel/results/zscores.tsv -g 15 -u 3

This creates the file tests/testdata/testpanel/results/cnv_calls.tsv which shows one called deletion. if you copy & paste this for your own data, please don't use the -g 15 -u 3 configuration. We use these in here just to be able to work with a tiny example.

More files for analysis can now be found in tests/testdata/testpanel/analysis.

HOW TO and WORKFLOW

clearCNV comprises of two major workflows and three major commads:

workflow

  1. re-assignment

    a) clearCNV workflow_reassignment

    b) clearCNV visualize_reassignment

  2. CNV calling

    a) clearCNV workflow_cnv_calling

preparations

Some files have to be acquired or created before these commands can be run:

  1. re-assignment:

    a) For each sequencing panel a .bed file is needed following this form. Such a file should always exist in the case of targeted sequencing.

    b) For each sequencing panel (or .bed-file containing all target informations) a simple list of the according .bam files is needed. An example can be found here. Make sure to use absolute paths for this file on custim data.

    c) meta-file. This file is a tab-separated file and one example can be found here. To avoid any confusion, we recommend using absolute paths here again.

  2. CNV calling:

    a) A reference file. It must be the same that was used to create the alignment files (.bam files).

    b) workflow_cnv_calling does CNV calling for each batch (or sequencing panel associated data set) separately. A text file with all .bam file paths for each batch and panel must be created. Here is an example showing only one .bam file path. Multiple paths are separated with a newline. This file is usually an output of clearCNV visualize_reassignment.

    c) The .bed-file for the sequencing panel for which this batch is put to CNV calling. An example can be found here. Note that gene is optimally replaced with the real name of the exon, gene or target.

    d) A k-mer alignability file in .bed format. Such files can be downloaded from UCSC (e.g. for Hg19 here). A k-mer mappability track can also be created for example using GenMap. In both cases the resulting Wig or BigWig files need to be converted to .bed to be used by clearCNV.

notes

The chromosome name scheme in the reference and .bed-file should be of the forms: ChrX, chrX, X or Chr1, chr1, 1.

CNV calling on chr X or chr Y: clearCNV automatically determines the copy number of the gonosomes. If your panel targets only a single gene there, it is better to delete according targets from the original .bed file to exclude them. It is necessary to have about double as many samples in your data set to enable meaningful CNV calling on the X or Y chromosomes with roughly equally many women and men in the samples.

NOTE

If you do sample re-assignment on your own data, followed by CNV-calling, then only one metafile, one coverages file, and one bedfile will be used. This means that --metafile, --coverages and --bedfile are given the same file paths in both workflow steps clearCNV workflow_reassignment and clearCNV visualize_reassignment of clearCNV.

Running Checks

Checks are automatically run on the master branch and pull requests. Unit and integration tests are based on pytest and formatting is enforced with black.

$ make test

History

v0.0.1

  • Everything is new!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clearCNV-0.306.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

clearCNV-0.306-py2.py3-none-any.whl (1.8 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file clearCNV-0.306.tar.gz.

File metadata

  • Download URL: clearCNV-0.306.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for clearCNV-0.306.tar.gz
Algorithm Hash digest
SHA256 ad9fa46b6b09f2286455d912820aef074929cc4f99cf93a510937f38d41e45a3
MD5 fd9faf4f24a765bb72076a4e408bef19
BLAKE2b-256 8b2e1944b2d45404dba382047e6e1c5550bed01176559a463432607d46a4117f

See more details on using hashes here.

File details

Details for the file clearCNV-0.306-py2.py3-none-any.whl.

File metadata

  • Download URL: clearCNV-0.306-py2.py3-none-any.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for clearCNV-0.306-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 456e98e699cefe2ffc847c08bf784d1a25ee1b46064cc976f77a05ea99388994
MD5 a4c8e3f48abb5f8cc7c47cd16d41695c
BLAKE2b-256 c6af81c81156747739493088d2e4b7f83bb6638a1795814ebe6b22e06f248ed9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page