Skip to main content

A package for calculating kmer complexity of sequences using Burrows-Wheeler compressibility.

Project description

KRIS Index

The beauty of science is to make things simple.
A basic principle of information theory is that compressibility or reducibility of a piece of information is inverse in proportion to its information content or entropy. At the same time, the Burrows-Wheeler transform (BWT) is a compression algorithm that is so effective on genomic sequence that it underlies many fundamental bioinformatic applications. It also underlies the BZIP2 compression program. Using this knowledge, we have developed the K-mer Reducible Information Statistic (KRIS) index, a new tool for analyzing the entropy/complexity/information content of genomic reads or k-mers. This system is designed to process k-mers efficiently and in a highly parallel manner to provide a fast filtering system that can be incorporated into bioinformatic pipelines. While the obvious application of this system is to filter out low-complexity, low-information reads (and we do recommend it for that), we would also encourage you to keep these low-scoring reads to test how they are handled by your favorite bioinformatic pipelines as a useful benchmark. Additionally, this program can be used on reference genome-derived k-mers to identify high- and low-information content regions.

The mission of this application is identical to the ZymoBIOMICS mission: improvement of all aspects of microbiome research..

Publication

Please watch this spot for the preprint and eventual publication

Quick Start Guide

Installation

pip3 install zymoKrisIndex

Usage

import zymoKrisIndex

sequence = "ATGCATGCATGCATGC" # Creating an arbitrary sequence

sequenceList = [sequence, sequence, sequence, sequence, sequence, sequence, sequence, sequence, sequence, sequence, ...]  # Creating a list of arbitrary sequences

dnaKris = zymoKrisIndex.KrisIndexCalculator()

krisIndex = dnaKris.calculateKrisIndex(sequence)  # returns the KRIS index as a float

krisIndices = dnaKris.calculateKrisIndexParallel(sequenceList)  # returns a list of KRIS indices

rnaKris = zymoKrisIndex.KrisIndexCalculator(zymoKrisIndex.RNA1000ALPHABET)  # Instantiates a KrisIndexCalculator with an RNA alphabet

proteinKris = zymoKrisIndex.KrisIndexCalculator(zymoKrisIndex.AMINO1000ALPHABET)  # Instantiates a KrisIndexCalculator with n amino acid alphabet

Prerequisites

This application requires Python 3.10 or later. It is also dependent on the Pydantic, bz2, and mpire packages.

Installation

Installation of this package can be done using pip3 as shown above.

API Reference

The API reference is available API.md.

Contributing

We welcome and encourage contributions to this project from the microbiomics community and will happily accept and acknowledge input (and possibly provide some free kits as a thank you). We aim to provide a positive and inclusive environment for contributors that is free of any harassment or excessively harsh criticism. Our Golden Rule: Treat others as you would like to be treated.

Versioning

We use a modification of Semantic Versioning to identify our releases.

Release identifiers will be major.minor.patch

Major release: Newly required parameter or other change that is not entirely backwards compatible Minor release: New optional parameter Patch release: No changes to parameters

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the GNU GPLv3 License - see the LICENSE file for details. This license restricts the usage of this application for non-open sourced systems. Please contact the authors for questions related to relicensing of this software in non-open sourced systems.

Acknowledgments

We would like to thank the following, without whom this would not have happened:

  • The Python Foundation
  • The staff at Zymo Research
  • The IMMSA bioinformatics interest group who suggested making this a full project
  • Our customers

Microbial Dark Matter Symposium

If you are reading this, you are also probably interested in pushing the limits of microbiome research and being able to study what we know is there, but cannot yet see. If so, the Microbial Dark Matter Symposium is for you!

This symposium will emerging technologies, case studies, and practical applications in:

  • Microbial Dark matter in Metagenomics to Explore Microbial Frontiers
  • Exploring Microbial Dark Matter in Extreme and Low Biomass Environments
  • Understanding Microbial Communities in the Built Environment
  • Bioinformatics Tools for Large-Scale Exploration of Hidden Microbial Life
  • Novel Methods for Cultivating ‘Unculturable’ Microbes

Register Here


If you like this software, please let us know at info@zymoresearch.com.

Please support our continued development of free and open-source microbiomics applications by checking out the latest microbiomics offerings from ZymoBIOMICS

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zymokrisindex-0.0.3.tar.gz (100.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zymokrisindex-0.0.3-py3-none-any.whl (92.7 kB view details)

Uploaded Python 3

File details

Details for the file zymokrisindex-0.0.3.tar.gz.

File metadata

  • Download URL: zymokrisindex-0.0.3.tar.gz
  • Upload date:
  • Size: 100.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for zymokrisindex-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7e0ad2cbfc2a6affb1a620bc8421d153a22cb01a7fe2c7ce4cf0407cc8ef13ae
MD5 c463190a403e863ffc99def4a81b0371
BLAKE2b-256 44e68e0974ba089e977fd716c299b1e60e118c5fc236254ee2c8aac7eb4576f8

See more details on using hashes here.

File details

Details for the file zymokrisindex-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: zymokrisindex-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 92.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for zymokrisindex-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0c738a4f14f37aa5a3dbb067288f160256a6f0fb6b33b053f0a7a301da659dc7
MD5 901359d97dce3a607d0468e456902350
BLAKE2b-256 8ef94a8b356a73110a1852e77b14cb47b432ce8dd6119c02888073e26d8800cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page