Skip to main content

Calculate d2s scores from short reads

Project description

d2ssect

example workflow

A tool to calculate d2s scores using short fastq reads This repo will test and benchmark the existing alignment-free tools and the improving versions.

The originally version of this pipeline including three big steps:

  1. get jellyfish count results
  2. calculate d2s using jellyfish dump results of every pair of samples
  3. generate a matrix

Our goal is to integrate these three steps and try to increase the speed of d2s calculation.

Installation

  1. Install dependencies
    • Jellyfish 2.3.0
    • python 3.8
    • pandas

Usage

Lets say we have a collection of fasta files corresponding to sequencing reads from samples that we want to compare with d2ssect. First count kmers in these files using jellyfish

for f in *.fasta;do jellyfish count -m 21 -s 10000000 $f -o ${f%.fasta}.jf ;done

Note that the command above will create a corresponding .jf file for every .fasta file in the current directory. By keeping the base names of the jf and fasta files identical we can then run d2ssect as follows;

python3 ../d2ssect/d2ssect/main.py -l *.jf -f *.fasta

Building from source

CC=g++ pip install .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

d2ssect-0.0.2.tar.gz (8.7 kB view hashes)

Uploaded Source

Built Distribution

d2ssect-0.0.2-cp39-cp39-macosx_10_9_x86_64.whl (36.5 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page