Skip to main content

Quickly get coverage statistics given reads and an assembly

Project description

Documentation Status codecov


Quickly get coverage statistics given reads and an assembly.

Motivation

While there are tools that will calculate read-coverage statistics, they do not scale particularly well for large datasets, large sample numbers, or large reference FASTAs. Koverage is designed to place minimal burden on I/O and RAM to allow for maximum scalability.

Install

Koverage is still in development, but is available on PyPI. Easy install:

pip install koverage

Developer install:

git clone https://github.com/beardymcjohnface/Koverage.git
cd Koverage
pip install -e .

Usage

Get coverage statistics from mapped reads (default method).

koverage run --reads readDir --ref assembly.fasta

Get coverage statistics using kmers (scales much better for very large reference FASTAs).

koverage run --reads readDir --ref assembly.fasta kmer

Any unrecognised commands are passed onto Snakemake. Run Koverage on a HPC using a Snakemake profile.

koverage run --reads readDir --ref assembly.fasta --profile mySlurmProfile

Test

You can test the methods like so.

# test default method
koverage test

# test all methods
koverage test map kmer bench

Coverage methods

Mapping-based (default)

koverage run ...
# or 
koverage run ... map

This method will map reads using minimap2 and use the mapping coordinates to calculate coverage. This method is suitable for most applications.

Kmer-based

koverage run ... kmer

This method calculates Jellyfish databases of the sequencing reads. It samples kmers from all reference contigs and queries them from the Jellyfish DBs to calculate coverage statistics. This method is exceptionally fast for very large reference genomes.

CoverM

koverage run ... bench

We've included a wrapper for CoverM which you may find useful. The wrapper manually runs minimap2 and then invokes CoverM on the sorted BAM file. It then combines the output from all samples like the other methods.

Outputs

Mapping-based

Default output files using fast estimations for contig coverage and variance.

sample_coverage.tsv Per sample and per contig counts.
Column description
Sample Sample name derived from read file name
Contig Contig ID from assembly FASTA
Count Raw mapped read count
RPM Reads per million
RPKM Reads per kilobase million
RPK Reads per kilobase
TPM Transcripts per million
Hitrate Estimated fraction of contig with depth > 0
Variance Estimated read depth variance

all_coverage.tsv Per contig counts (all samples).
Column description
Contig Contig ID from assembly FASTA
Count Raw mapped read count
RPM Reads per million
RPKM Reads per kilobase million
RPK Reads per kilobase
TPM Transcripts per million

(more outputs to come, watch this space)

Kmer-based

Outputs for kmer-based coverage metrics. Kmer outputs are gzipped as it is anticipated that this method will be used with very large reference FASTA files.

sample_kmer_coverage.NNmer.tsv.gz Per sample and contig kmer coverage.
Column description
Sample Sample name derived from read file name
Contig Contig ID from assembly FASTA
Sum Sum of sampled kmer depths
Mean Mean sampled kmer depth
Median Median sampled kmer depth
Hitrate Fraction of kmers with depth > 0
Variance Variance of lowest 95 % of sampled kmer depths

all_kmer_coverage.NNmer.tsv.gz Contig kmer coverage (all samples).
Column description
Contig Contig ID from assembly FASTA
Sum Sum of sampled kmer depths
Mean Mean sampled kmer depth
Median Median sampled kmer depth

(more outputs to come, watch this space)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

koverage-0.1.0.tar.gz (19.8 MB view hashes)

Uploaded Source

Built Distribution

koverage-0.1.0-py3-none-any.whl (19.8 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page