kitsune

a toolkit for evaluation of the lenght of k-mer in a given genome dataset for alignment-free phylogenimic analysis

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

KITSUNE is a toolkit for evaluation of the lenght of k-mer in a given genome dataset for alignment-free phylogenimic analysis.

K-mer based approach is simple and fast yet has been widely used in many applications including biological sequence comparison. However, selection of an appropriate k-mer length to obtain a good information content for comparison is normally overlooked. Therefore, we have developed KITSUNE to aid k-mer length selection process based on a three steps aproach described in Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer.

KITSUNE uses Jellyfish software Jellyfish for k-mer counting. Thanks to Jellyfish developer.

KITSUNE will calculte the three matrices across considered k-emer range :

Cumulative Relative Entropy (CRE)
Averrage number of Common Feature (ACF)
Obserbed Common Feature (OCF)

Moreverver, KITSUNE also provides various genomic distance calculations from the k-mer frequnce vectors that can be used for species identifiction or phylogenomic tree construction.

If you use KITSUNE in your research, please cite: Reference

Installation

Clone the repository and install it throught pip

pip install kitsune

Usage

Calculate CRE, ACF, and OFC value for specific kmer

Kitsune provides three commands to calculate an appropiate k-mer using CRE, ACF, and OCF.

kitsune cre genome_fasta/* -ks 5 -ke 10
kitsune acf genome_fasta/* -ks 5 -ke 10
kitsune ocf genome_fasta/* -ks 5 -ke 10

Calculate genomic distance at specific k-mer from kmer frequency vectors of two of genomes

Kitsune provides a commands to calculate genomic distance using different distance estimation method.

distance option	name
braycurtis	Bray-Curtis distance
canberra	Canberra distance
chebyshev	Chebyshev distance
cityblock	City Block (Manhattan) distance
correlation	Correlation distance
cosine	Cosine distance
euclidean	Euclidean distance
jensenshannon	Jensen-Shannon distance
sqeuclidean	Squared Euclidean distance
dice	Dice dissimilarity
hamming	Hamming distance
jaccard	Jaccard-Needham dissimilarity
kulsinski	Kulsinski dissimilarity
rogerstanimoto	Rogers-Tanimoto dissimilarity
russellrao	Russell-Rao dissimilarity
sokalmichener	Sokal-Michener dissimilarity
sokalsneath	Sokal-Sneath dissimilarity
yule	Yule dissimilarity
mash	MASH distance
jsmash	MASH Jensen-Shannon distance
jaccarddistp	Jaccard-Needham dissimilarity Probability

kitsune dmatrix genome1.fna genome2.fna -k 17 -d jaccard --canonical --fast -o output.txt
kitsune dmatrix genome1.fna genome2.fna -k 17 -d hensenshannon --canonical --fast -o output.txt

Find optimum k-mer from a given set of genome

Kitsune provides a comand to find optimum k-mer length in agiven set of genome.

First download the example files.Download

Then use kitsune kopt command

-i : path to list of genome files

-ks: The smallest kmer-length to consider

-kl: The largest kmer-length to consider

-o: output file

**Please be aware that this comand will use big computational resources when large number of genomes and/or large genome size are used as the input.

kitsune kopt -i genome_list -ks 7 -kl 15 --canonical --fast -o output.txt

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.3.3

Mar 29, 2023

1.3.2

Feb 22, 2023

1.3.1

Jul 27, 2020

1.2.13

Jul 14, 2020

1.2.12

Jul 11, 2020

1.2.11

Jun 23, 2020

1.2.10

Apr 27, 2020

This version

1.2.9

Apr 16, 2020

1.2.8

Jan 1, 2020

1.2.6

Jul 14, 2019

1.2.4

Jul 6, 2019

1.2.2

Jun 13, 2019

1.2.0

Jun 8, 2019

1.1.6

Jun 3, 2019

1.1.4

May 21, 2019

1.1.2

May 6, 2019

1.1.0

Apr 16, 2019

1.0.0

Apr 7, 2019

0.9.3

Apr 7, 2019

0.9.2

Apr 5, 2019

0.9.1

Nov 8, 2018

0.0.0

Apr 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kitsune-1.2.9.tar.gz (3.0 MB view hashes)

Uploaded Apr 16, 2020 Source

Built Distribution

kitsune-1.2.9-py2.py3-none-any.whl (3.0 MB view hashes)

Uploaded Apr 16, 2020 Python 2 Python 3

Hashes for kitsune-1.2.9.tar.gz

Hashes for kitsune-1.2.9.tar.gz
Algorithm	Hash digest
SHA256	`f7156448f2305b4d170d255c02b8960c4bca4d5bbe0755b5e5993517af47c662`
MD5	`1d062af3f6a70a9d8e46a81c0d3ceb92`
BLAKE2b-256	`502697e63906bda59c091e3d112487d47465f6306880a3d3978d5bd4d41e40c0`

Hashes for kitsune-1.2.9-py2.py3-none-any.whl

Hashes for kitsune-1.2.9-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`19cc3b677ad0cd9e193f39af7dbd5060e3f0cdc9c299806b8f82410b32d9d01b`
MD5	`9cfb75efc6c34093cba0f8e6430f3a5f`
BLAKE2b-256	`a1be3d71a0b75f1b7c8eda4700e4c15cc1f353d4868d1cdb537209122a4631a7`