Calculated coverage metrics from a GATK3 Depth Of Coverage file and a bedfile
Project description
CoverageCalculatorPy
Given i) a tabix indexed per-base 'depth of coverage' file (similar to generated in GATK3) and , ii) a bed file CoverageCalculatorPy will generate four text reports:
- .coverage file containing the mean depth of coverage across each interval in the bedfile, and the percentage of bases which meet a given depth (default is 100x) across each interval.
- .totalcoverage file containing the the same metrics above summerised over all intervals in the given bedfile. Summeries of adittional subsets of the input bedfile can be included using --groups (see below)
- .gaps file contains intervals which do not meet the given depth of coverage threshold
- .missing file contains intervals which do not have a corresponding coordinate in the 'depth of coverage' file, and therefore cannot be evaluated.
Input Arguments
-D/--depthfile
path to tabix indexed depth-of-coverage file
-B/--bedfile
path to bedfile. Chromosomes must not be prefixed with 'chr'
-d/--depth
depth threshold for precentage horizontal coverage calculation (default: 100)
-o/--outname
output name to prefix on text reports (default: output)
-O/--outdir
directory to save output files to (default: current)
-g/--groupfile
path to groupfile (see below)
Tabix indexing a GATK3 DepthOfCoverage file
The 'depth of coverage' file must be tabix indexed. The first three columns of the depthfile must be; chromosome, coordinate and depth. A file generated in GATK3 can be indexed as follows:
sed 's/:/\t/g' <GATK depthOfCoverage file> | grep -v 'Locus' | sort -k1,1 -k2,2n | bgzip > <filename.gz>
(on macOS)
sed "s/:/$(printf '\t')/g" <GATK depthOfCoverage file> | grep -v 'Locus' | sort -k1,1 -k2,2n | bgzip > <filename.gz>
tabix -b 2 -e 2 -s 1 <filename.gz>
Adding a groupfile
The groupfile is a way of generating combined metrics across a number of intervals (i.e. combined across all exons in a gene). These metrics will appear in the .totalcoverage file. The groupfile must have a header (this will be included in the output), be a single column containing the same number of rows as the bedfile it will be analysed with.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for CoverageCalculatorPy-1.0.1-py3.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 498913f2ab7853b654a9f2c9de14ea3d6902bcd7e119dae7676cad5bb3022244 |
|
MD5 | 6f4d3e7cc0d29ad549432694b911c95d |
|
BLAKE2b-256 | 64293afbde540f1ec7fe55a45f2e661d5cb8629b90d4392b809d1492736b9985 |
Hashes for CoverageCalculatorPy-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eedcaa17446cfa418be4df55714ac32b655d52b912917a09f0c8fdc0073e0784 |
|
MD5 | 9bc825a6610259a951b5d64c02c252f8 |
|
BLAKE2b-256 | cfe689d7628d0b6eee8843e3a681f1b880b0690d45a8c379caa14318b2302982 |