Calculate d2s scores from short reads
Project description
d2ssect
A tool to calculate d2s scores using short fastq reads This repo will test and benchmark the existing alignment-free tools and the improving versions.
The originally version of this pipeline including three big steps:
- get jellyfish count results
- calculate d2s using jellyfish dump results of every pair of samples
- generate a matrix
Our goal is to integrate these three steps and try to increase the speed of d2s calculation.
Installation
d2ssect
relies heavily on jellyfish. You need the jellyfish program and also the jellyfish libraries. To check that jellyfish is installed you can do;
jellyfish --version
Which should return a version > 2. In addition, you need the jellyfish libraries and headers. If you installed jellyfish via conda
or by compiling from source these will be present in the right locations. If you installed it your linux package manager they probably won't be present.
If you do not want to use conda
we recommend installing Jellyfish from source. Once done you should then be able to install d2ssect
using pip
pip3 install d2ssect
Usage
Lets say we have a collection of fasta files corresponding to sequencing reads from samples that we want to compare with d2ssect
. First count kmers in these files using jellyfish
for f in *.fasta;do jellyfish count -m 21 -s 10000000 $f -o ${f%.fasta}.jf ;done
Note that the command above will create a corresponding .jf
file for every .fasta
file in the current directory. By keeping the base names of the jf
and fasta
files identical we can then run d2ssect
as follows;
python3 ../d2ssect/d2ssect/main.py -l *.jf -f *.fasta
Building from source
CC=g++ pip install .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for d2ssect-0.0.3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a38e11c691c5de98e59efb199e8f164a271f551b13d08575f3582fd25262619 |
|
MD5 | 6a3d9556d3a134d9945102d27acc750d |
|
BLAKE2b-256 | d4a458db073c8580115c071b7c5f5de8612ebdec89c90fef524c45e0b744c7e1 |