Quickly get coverage statistics given reads and an assembly

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

GitHub last commit (branch)

Quickly get coverage statistics given reads and an assembly.

Motivation

While there are tools that will calculate read-coverage statistics, they do not scale particularly well for large datasets, large sample numbers, or large reference FASTAs. Koverage is designed to place minimal burden on I/O and RAM to allow for maximum scalability.

Install

Koverage is available on PyPI and Bioconda.

Recommend create env for installation:

conda create -n koverage python=3.11
conda activate koverage

Install with PIP:

pip install koverage

Install with Bioconda:

conda install -c bioconda koverage

Test the installation

koverage test

Developer install:

git clone https://github.com/beardymcjohnface/Koverage.git
cd Koverage
pip install -e .

Usage

Get coverage statistics from mapped reads (default method).

koverage run --reads readDir --ref assembly.fasta

Get coverage statistics using kmers (scales much better for very large reference FASTAs).

koverage run --reads readDir --ref assembly.fasta kmer

Any unrecognised commands are passed onto Snakemake. Run Koverage on a HPC using a Snakemake profile.

koverage run --reads readDir --ref assembly.fasta --profile mySlurmProfile

Parsing samples with `--reads`

You can pass either a directory of reads or a TSV file to --reads. Note that Koverage expects your read file names to include R1 or R2 e.g. Tynes-BDA-rw-1_S14_L001_R1_001.fastq.gz or SRR7141305_R2.fastq.gz.

Directory: Koverage will infer sample names and _R1/_R2 pairs from the filenames.
TSV file: Koverage expects 2 or 3 columns, with column 1 being the sample name and columns 2 and 3 the reads files.

More information and examples are available here

Test

You can test the methods with the inbuilt dataset like so.

# test default method
koverage test

# test all methods
koverage test map kmer coverm

Coverage methods

Mapping-based (default)

koverage run ...
# or 
koverage run ... map

This method will map reads using minimap2 and use the mapping coordinates to calculate coverage. This method is suitable for most applications.

Kmer-based

koverage run ... kmer

This method calculates Jellyfish databases of the sequencing reads. It samples kmers from all reference contigs and queries them from the Jellyfish DBs to calculate coverage statistics. This method is exceptionally fast for very large reference genomes.

CoverM

koverage run ... coverm

We've included a wrapper for CoverM which you may find useful. The wrapper manually runs minimap2 and then invokes CoverM on the sorted BAM file. It then combines the output from all samples like the other methods. If you have a large tempfs/ you'll probably find it faster to run CoverM directly on your reads. CoverM is not currently available for MacOS.

Outputs

Mapping-based

Default output files using fast estimations for mean, median, hitrate, and variance.

sample_coverage.tsv

Per sample and per contig counts.

Column	description
Sample	Sample name derived from read file name
Contig	Contig ID from assembly FASTA
Count	Raw mapped read count
RPM	Reads per million
RPKM	Reads per kilobase million
RPK	Reads per kilobase
TPM	Transcripts per million
Mean	Estimated mean read depth
Median	Estimated median read depth
Hitrate	Estimated fraction of contig with depth > 0
Variance	Estimated read depth variance

all_coverage.tsv

Per contig counts (all samples).

Column	description
Contig	Contig ID from assembly FASTA
Count	Raw mapped read count
RPM	Reads per million
RPKM	Reads per kilobase million
RPK	Reads per kilobase
TPM	Transcripts per million

Kmer-based

Outputs for kmer-based coverage metrics. Kmer outputs are gzipped as it is anticipated that this method will be used with very large reference FASTA files.

sample_kmer_coverage.NNmer.tsv.gz

Per sample and contig kmer coverage.

Column	description
Sample	Sample name derived from read file name
Contig	Contig ID from assembly FASTA
Sum	Sum of sampled kmer depths
Mean	Mean sampled kmer depth
Median	Median sampled kmer depth
Hitrate	Fraction of kmers with depth > 0
Variance	Variance of lowest 95 % of sampled kmer depths

all_kmer_coverage.NNmer.tsv.gz

Contig kmer coverage (all samples).

Column	description
Contig	Contig ID from assembly FASTA
Sum	Sum of sampled kmer depths
Mean	Mean sampled kmer depth
Median	Median sampled kmer depth

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.11

Feb 8, 2024

0.1.10

Feb 3, 2024

0.1.9

Jan 29, 2024

0.1.8

Jan 23, 2024

0.1.7

Nov 20, 2023

0.1.6

Sep 14, 2023

0.1.5

Aug 18, 2023

0.1.4

Aug 8, 2023

0.1.3

Jun 30, 2023

0.1.2

Jun 28, 2023

0.1.1

Jun 26, 2023

0.1.0

Jun 16, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

koverage-0.1.11.tar.gz (19.8 MB view hashes)

Uploaded Feb 8, 2024 Source

Built Distribution

koverage-0.1.11-py3-none-any.whl (19.8 MB view hashes)

Uploaded Feb 8, 2024 Python 3

Hashes for koverage-0.1.11.tar.gz

Hashes for koverage-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`24d785d654524a59c109f6fb3b3f6fc211b5f7177d643597f144fc6954a74d57`
MD5	`bfa0567200efd9d8b6a7c9b0c280b665`
BLAKE2b-256	`901f754fbcaf32305ad0578604100b8c0619fbcb82eafb85801514e0728130a5`

Hashes for koverage-0.1.11-py3-none-any.whl

Hashes for koverage-0.1.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bb8d5e7573aa4710f43c40ea7da470eb99ea3329b10a413f9b49e86cf88ac02a`
MD5	`3c430ff5be99ba8ca1c335e3eebe6d6b`
BLAKE2b-256	`4a882c5af99fcb8de272e6ca812547519d540adbcb51f1267807e603dd79d214`

koverage 0.1.11

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Project description

Motivation

Install

Usage

Parsing samples with `--reads`

Test

Coverage methods

Mapping-based (default)

Kmer-based

CoverM

Outputs

Mapping-based

Kmer-based

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

koverage 0.1.11

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Project description

Motivation

Install

Usage

Parsing samples with --reads

Test

Coverage methods

Mapping-based (default)

Kmer-based

CoverM

Outputs

Mapping-based

Kmer-based

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Parsing samples with `--reads`