Skip to main content

Calculate d2s scores from short reads

Project description

d2ssect

conda install test badge linux install test badge macos install test badge

d2ssect calculates an alignment-free distance between samples based on frequencies of shared kmers. Specifically, it provides a fast implementation of the D2S statistic which can be used as a standalone distance measure, or as input to a range of methods (eg see these tools) for phylogenetic and network analysis.

Installation

d2ssect is available via pypi. Installation requires python 3.7 or greater as well as the jellyfish program and libraries. We recommend installation into a conda environment as follows

conda create -n d2ssect python=3.7 kmer-jellyfish
conda activate d2ssect
pip install d2ssect
d2ssect -h

Alternatively, you may use an existing Jellyfish installation, or install Jellyfish without using conda. If using this method please note that;

  • Jellyfish version 2 is required (Jellyfish 1 will not work)
  • Installation of Jellyfish via linux package managers will not work as this installs the jellyfish binary but not libraries and headers needed by d2ssect

Once Jellyfish is installed you should then be able to install d2ssect using pip or pip3 as follows

pip install d2ssect

Usage

Lets say we have a collection of fastq files corresponding to sequencing reads from different samples. We want to compare these with d2ssect. First count kmers in these files using jellyfish

for f in *.fastq;do jellyfish count -m 21 -s 10000000 $f -o ${f%.fastq}.jf ;done

Note that the command above will create a corresponding .jf file for every .fastq file in the current directory. By keeping the base names of the jf and fastq files identical we can then run d2ssect as follows;

d2ssect -l *.jf -f *.fastq

Outputs

d2ssect provides information on progress (sent to stderr) and will eventually produce a matrix of pairwise D2S values (one for each pair of samples) sent to stdout.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

d2ssect-0.0.9.tar.gz (8.1 kB view details)

Uploaded Source

File details

Details for the file d2ssect-0.0.9.tar.gz.

File metadata

  • Download URL: d2ssect-0.0.9.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.12

File hashes

Hashes for d2ssect-0.0.9.tar.gz
Algorithm Hash digest
SHA256 75afa5fd8d5826b442714a2db6ace721d216f62b9d592f611e3b4776f10fbf64
MD5 dc5cdb8bec46ce2d03ba95fdcb67024a
BLAKE2b-256 ffff6eba6a04cf5f088c3edbd2f706c6e4445ad598cb7dcbb16ddd87cdb0a17d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page