Skip to main content

A tool for classifying metagenomic data

Project description

Tiara

Deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data powered by PyTorch.

The sequences are classified in two stages:

  • In the first stage, the sequences are classified to classes: archaea, bacteria, prokarya, eukarya, organelle and unknown.
  • In the second stage, the sequences labeled as organelle in the first stage are classified to either mitochondria, plastid or unknown.

For more information, please refer to our paper: Tiara: Deep learning-based classification system for eukaryotic sequences.

Supplementary data

Supplementary sequences

Requirements

  • Python >= 3.7
  • numpy, biopython, torch, skorch, tqdm, joblib, numba

Installation

More detailed installation instructions can be found here.

Using pip

Run pip install tiara, preferably in a fresh environment.

Using setup.py

Latest stable release
Latest developer version
git clone https://github.com/ibe-uw/tiara.git
cd tiara
python setup.py install

Testing the installation

After the installation, run tiara-test to see if the installation was successful.

Usage

Basic usage:

tiara -i sample_input.fasta -o out.txt

The sequences in the fasta file should be at least 3000 bases long (default value). We do not recommend classify sequences that are shorter than 1000 base pairs.

It creates two files:

  • out.txt, a tab-separated file with header sequence id, first stage classification result, second stage classification result.
  • log_out.txt, containing model parameters and classification summary.

Advanced:

tiara -i sample_input.fasta -o out.txt --tf mit pla pro -t 4 -p 0.65 0.60 --probabilities

In addition to creating the files above, it creates, in the folder where tiara is run, three files containing sequences from sample_input.fasta classified as mitochondria, plastid and prokarya (--tf mit pla pro option).

The number of threads is set to 4 (-t 4) and probability cutoffs in the first and second stage of classification are set to 0.65 and 0.6, respectively.

The probabilities of belonging to individual classes are also written to out.txt, thanks to --probabilities option.

For more usage examples, go here.

Citation

Michał Karlicki, Stanisław Antonowicz, Anna Karnkowska, Tiara: deep learning-based classification system for eukaryotic sequences, Bioinformatics, Volume 38, Issue 2, 15 January 2022, Pages 344–350, https://doi.org/10.1093/bioinformatics/btab672

License

Tiara is released under an open-source MIT license

Version history:

  • 1.0.3 – added pyproject.toml, updated dependencies to python<3.10 – unfortunately tiara doesn't work right now with python newer than 3.9 due to torch 1.7.0 compatibility issues. Added option to use gzipped fasta file as input (automatically identified by .gz suffix).
  • 1.0.2 – added Python 3.9 compatibility, added an option to gzip the results. Added this README section.
  • 1.0.0, 1.0.1 – initial releases.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiara-1.0.3.tar.gz (102.4 MB view details)

Uploaded Source

Built Distribution

tiara-1.0.3-py3-none-any.whl (102.4 MB view details)

Uploaded Python 3

File details

Details for the file tiara-1.0.3.tar.gz.

File metadata

  • Download URL: tiara-1.0.3.tar.gz
  • Upload date:
  • Size: 102.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for tiara-1.0.3.tar.gz
Algorithm Hash digest
SHA256 77585ae73a019dc73d7f0397082685a662c07ab15bdba7ee18c1cda41738ad76
MD5 b68f325424204ca4f4d70d25b9210948
BLAKE2b-256 2eb433c52258af0a9db829313394c3ace6aaff9a2ba65ebd3ff6760c55802e27

See more details on using hashes here.

File details

Details for the file tiara-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: tiara-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 102.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for tiara-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5b149e5fca7ddc324c31b4bc1f1139c2ab64a5ff95841a1409b093a63832aac9
MD5 aef0e0925ed8747bde319e959d5f4a4c
BLAKE2b-256 bbef1b3fac9bfdcaabdfacd9d5daf7aed5485fe61be4b2edde024514126b5def

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page