Skip to main content

Calculate d2s scores from short reads

Project description

d2ssect

conda install test badge

linux install test badge

A tool to calculate d2s scores using short fastq reads This repo will test and benchmark the existing alignment-free tools and the improving versions.

The originally version of this pipeline including three big steps:

  1. get jellyfish count results
  2. calculate d2s using jellyfish dump results of every pair of samples
  3. generate a matrix

Our goal is to integrate these three steps and try to increase the speed of d2s calculation.

Installation

d2ssect relies heavily on jellyfish. You need the jellyfish program and also the jellyfish libraries. To check that jellyfish is installed you can do;

jellyfish --version

Which should return a version > 2. In addition, you need the jellyfish libraries and headers. If you installed jellyfish via conda or by compiling from source these will be present in the right locations. If you installed it your linux package manager they probably won't be present.

If you do not want to use conda we recommend installing Jellyfish from source. Once done you should then be able to install d2ssect using pip

pip3 install d2ssect

Usage

Lets say we have a collection of fasta files corresponding to sequencing reads from samples that we want to compare with d2ssect. First count kmers in these files using jellyfish

for f in *.fasta;do jellyfish count -m 21 -s 10000000 $f -o ${f%.fasta}.jf ;done

Note that the command above will create a corresponding .jf file for every .fasta file in the current directory. By keeping the base names of the jf and fasta files identical we can then run d2ssect as follows;

python3 ../d2ssect/d2ssect/main.py -l *.jf -f *.fasta

Building from source

CC=g++ pip install .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

d2ssect-0.0.8.tar.gz (9.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page