Calculate d2s scores from short reads
Project description
d2ssect
A tool to calculate d2s scores using short fastq reads This repo will test and benchmark the existing alignment-free tools and the improving versions.
The originally version of this pipeline including three big steps:
- get jellyfish count results
- calculate d2s using jellyfish dump results of every pair of samples
- generate a matrix
Our goal is to integrate these three steps and try to increase the speed of d2s calculation.
Installation
- Install dependencies
- Jellyfish 2.3.0
- python 3.8
- pandas
Usage
Lets say we have a collection of fasta files corresponding to sequencing reads from samples that we want to compare with d2ssect
. First count kmers in these files using jellyfish
for f in *.fasta;do jellyfish count -m 21 -s 10000000 $f -o ${f%.fasta}.jf ;done
Note that the command above will create a corresponding .jf
file for every .fasta
file in the current directory. By keeping the base names of the jf
and fasta
files identical we can then run d2ssect
as follows;
python3 ../d2ssect/d2ssect/main.py -l *.jf -f *.fasta
Building from source
CC=g++ pip install .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for d2ssect-0.0.2-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3edaadcbe9b96deb00b16926fb37b27b30fd9243709343082dca234cefe46214 |
|
MD5 | 760c44ac0ca19d5d387585257524e641 |
|
BLAKE2b-256 | 4f7fe83e91932fca1173f446e9fdaa883f732ddc99c0bb8892f043716dbde10e |