Skip to main content

a toolkit for evaluation of the lenght of k-mer in a given genome dataset for alignment-free phylogenimic analysis

Project description

KISUNE

KISUNE

KITSUNE is a toolkit for evaluation of the lenght of k-mer in a given genome dataset for alignment-free phylogenimic analysis.

K-mer based approach is simple and fast yet has been widely used in many applications including biological sequence comparison. However, selection of an appropriate k-mer length to obtain a good information content for comparison is normally overlooked. Therefore, we have developed KITSUNE to aid k-mer length selection process based on a three steps aproach described in Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer.

KITSUNE uses Jellyfish software Jellyfish for k-mer counting. Thanks to Jellyfish developer.

KITSUNE will calculte the three matrices across considered k-emer range :

  1. Cumulative Relative Entropy (CRE)

  2. Averrage number of Common Feature (ACF)

  3. Obserbed Common Feature (OCF)

Moreverver, KITSUNE also provides various genomic distance calculations from the k-mer frequnce vectors that can be used for species identifiction or phylogenomic tree construction.

If you use KITSUNE in your research, please cite: Reference

Installation

Clone the repository and install it throught pip

pip install kitsune

Usage

Calculate CRE, ACF, and OFC value for specific kmer

Kitsune provides three commands to calculate an appropiate k-mer using CRE, ACF, and OCF.

kitsune cre genome_fasta/* -ks 5 -ke 10
kitsune acf genome_fasta/* -ks 5 -ke 10
kitsune ocf genome_fasta/* -ks 5 -ke 10

Calculate genomic distance at specific k-mer from kmer frequency vectors of two of genomes

Kitsune provides a commands to calculate genomic distance using different distance estimation method.

distance option

name

braycurtis

Bray-Curtis distance

canberra

Canberra distance

chebyshev

Chebyshev distance

cityblock

City Block (Manhattan) distance

correlation

Correlation distance

cosine

Cosine distance

euclidean

Euclidean distance

jensenshannon

Jensen-Shannon distance

sqeuclidean

Squared Euclidean distance

dice

Dice dissimilarity

hamming

Hamming distance

jaccard

Jaccard-Needham dissimilarity

kulsinski

Kulsinski dissimilarity

rogerstanimoto

Rogers-Tanimoto dissimilarity

russellrao

Russell-Rao dissimilarity

sokalmichener

Sokal-Michener dissimilarity

sokalsneath

Sokal-Sneath dissimilarity

yule

Yule dissimilarity

mash

MASH distance

jsmash

MASH Jensen-Shannon distance

jaccarddistp

Jaccard-Needham dissimilarity Probability

kitsune dmatrix genome1.fna genome2.fna -k 17 -d jaccard --canonical --fast -o output.txt
kitsune dmatrix genome1.fna genome2.fna -k 17 -d hensenshannon --canonical --fast -o output.txt

Find optimum k-mer from a given set of genome

Kitsune provides a comand to find optimum k-mer length in agiven set of genome.

First download the example files.Download

Then use kitsune kopt command

-i : path to list of genome files

-ks: The smallest kmer-length to consider

-kl: The largest kmer-length to consider

-o: output file

**Please be aware that this comand will use big computational resources when large number of genomes and/or large genome size are used as the input.

kitsune kopt -i genome_list -ks 7 -kl 15 --canonical --fast -o output.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kitsune-1.2.10.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kitsune-1.2.10-py2.py3-none-any.whl (3.0 MB view details)

Uploaded Python 2Python 3

File details

Details for the file kitsune-1.2.10.tar.gz.

File metadata

  • Download URL: kitsune-1.2.10.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for kitsune-1.2.10.tar.gz
Algorithm Hash digest
SHA256 04a125478a699528017d6fc672b980f3cfc5948ba4009aaf08f26a5a65876cf4
MD5 807a58d24d2ccb5b1a5af6a1a458ff66
BLAKE2b-256 43cc0092419783de9f2766d1d160221271398c65f101b03f24b33b84336579a4

See more details on using hashes here.

File details

Details for the file kitsune-1.2.10-py2.py3-none-any.whl.

File metadata

  • Download URL: kitsune-1.2.10-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for kitsune-1.2.10-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0251c90fa083de264d2f0c17664eb30e860c7bf2668ae773e594dff01e4780d7
MD5 41aec496777483e4735b88883f79b223
BLAKE2b-256 9b7b4cba21ff2e8fbbd9abe2aae26f29c6edce48ea672839dc077c1e532fe4a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page