Calculate d2s scores from short reads
Project description
d2ssect
A tool to calculate d2s scores using short fastq reads This repo will test and benchmark the existing alignment-free tools and the improving versions.
The originally version of this pipeline including three big steps:
- get jellyfish count results
- calculate d2s using jellyfish dump results of every pair of samples
- generate a matrix
Our goal is to integrate these three steps and try to increase the speed of d2s calculation.
Installation
- Install dependencies
- Jellyfish 2.3.0
- python 3.8
- pandas
Usage
Lets say we have a collection of fasta files corresponding to sequencing reads from samples that we want to compare with d2ssect
. First count kmers in these files using jellyfish
for f in *.fasta;do jellyfish count -m 21 -s 10000000 $f -o ${f%.fasta}.jf ;done
Note that the command above will create a corresponding .jf
file for every .fasta
file in the current directory. By keeping the base names of the jf
and fasta
files identical we can then run d2ssect
as follows;
python3 ../d2ssect/d2ssect/main.py -l *.jf -f *.fasta
Building from source
CC=g++ pip install .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for d2ssect-0.0.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9efba9ff8e0faa53d726c7e144df01c6ddfea92d541c96640b6dd4b1c268368 |
|
MD5 | a24a7e6cf6147886d476f5c479f05522 |
|
BLAKE2b-256 | 2f581b99db411dce3ed3ff3a5b22ebbbadb20db722d4be16d0ebb0e465f57237 |