Skip to main content

Transposable Element Repeat Result classifIER

Project description

https://raw.githubusercontent.com/rbturnbull/terrier/main/docs/images/terrier-banner.png

PyPI - Version colab badge testing badge docs badge black badge torchapp badge

Transposable Element Repeat Result classifIER

Terrier is a Neural Network model to classify transposable element sequences.

It is based on ‘corgi’ which was trained to do hierarchical taxonomic classification of DNA sequences.

This model was trained using the Repbase library of repetitive DNA elements and trained to do hierarchical classification according to the RepeatMasker schema.

Installation

Install using pip:

pip install bio-terrier

Or install the latest version from GitHub:

pip install git+https://github.com/rbturnbull/terrier.git

Google Colab Version

Follow this link to launch a Google Colab notebook where you can run the model on your own data: colab badge2

Usage

To run inference on a FASTA file, run this command:

terrier --file INPUT.fa --output-fasta OUTPUT.fa

That will add the classification to after the sequence ID in the OUTPUT.fa FASTA file.

If you want to save the probabilities for all classes run this:

terrier --file INPUT.fa --output-csv OUTPUT.csv

The columns will be the probability of each classification and the rows correspond to each sequence in INPUT.fa.

If you want to output a visualization of the prediction probabilities:

terrier --file INPUT.fa --image-dir OUTPUT-IMAGES/

The outputs for the above can be combined together. For more options run

terrier --help

To see the options to train the model, run:

terrier-tools --help

Programmatic Usage

You can also use the model programmatically:

from terrier import Terrier

terrier = Terrier()
terrier(file="INPUT.fa", output_fasta="OUTPUT.fa")

Potential Use Case

A potential workflow is to use RepeatModeler first to generate a repeat library. Then you can use Terrier to attempt to classify the remaining unknown repeats. If you only want highly confident classifications from Terrier, you can set the threshold to 0.9 or higher. If you wish to have more coverage, then you can set the threshold lower (or keep it at the default value of 0.7). The modified repeat library can then be used with RepeatMasker to mask the repeats in your genome assembly.

Credits

Terrier was developed by:

If you use this software, please cite the following preprint:

Robert Turnbull, Neil D. Young, Edoardo Tescari, Lee F. Skerratt, and Tiffany A. Kosch. (2025). ‘Terrier: A Deep Learning Repeat Classifier’. arXiv:2503.09312.

This command will generate a bibliography for the Terrier project.

terrier --bibliography

Here it is in BibTeX format:

@article{terrier,
    title = {{Terrier: A Deep Learning Repeat Classifier}},
    author = {Turnbull, Robert and Young, Neil D. and Tescari, Edoardo and Skerratt, Lee F. and Kosch, Tiffany A.},
    year = {2025},
    journal = {arXiv},
    url = {https://arxiv.org/abs/2503.09312},
    doi = {10.48550/arXiv.2503.09312}
}

Run the following command to get the latest BibTeX entry:

terrier --bibtex

This will be updated with the final publication details when available.

Created using torchapp (https://github.com/rbturnbull/torchapp).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio_terrier-0.3.2.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bio_terrier-0.3.2-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file bio_terrier-0.3.2.tar.gz.

File metadata

  • Download URL: bio_terrier-0.3.2.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.1 Darwin/24.5.0

File hashes

Hashes for bio_terrier-0.3.2.tar.gz
Algorithm Hash digest
SHA256 1141769264310d300e8e1b3850893928f9be598d2d02dbcfe109f695d4c4c3ac
MD5 59e49022cb2bf14e1e62e88d292d0a10
BLAKE2b-256 6d717e033061e950df5d7868a0cf540be52f6b5113b74b980be3e77af9c04663

See more details on using hashes here.

File details

Details for the file bio_terrier-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: bio_terrier-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.1 Darwin/24.5.0

File hashes

Hashes for bio_terrier-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c99c6c5b459879ab98ec71e4fdefd048710765bf30de44d467ef34b3de686cc7
MD5 16e24bf53d53b9c658fa536d7096f0f9
BLAKE2b-256 a2ff3f02e2acf05725b57ebceab1696971730b9f89e4af4a7712cabf75b6f7a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page