Skip to main content

Transposable Element Repeat Result classifIER

Project description

https://raw.githubusercontent.com/rbturnbull/terrier/main/docs/images/terrier-banner.png

PyPI - Version colab badge testing badge docs badge black badge torchapp badge doi badge

Transposable Element Repeat Result classifIER

Terrier is a Neural Network model to classify transposable element sequences.

It is based on ‘corgi’ which was trained to do hierarchical taxonomic classification of DNA sequences.

This model was trained using the Repbase library of repetitive DNA elements and trained to do hierarchical classification according to the RepeatMasker schema.

An online version of Terrier (using CPUs only) is available at https://portal.cpg.unimelb.edu.au/tools/terrier.

Latest Results

Terrier v0.4 was released and this was trained on RepBase31.04 (released 04-23-2026) which as substantially more sequences than the previous version (RepBase29.10) used to train the version of Terrier for the publication in Briefings in Bioinformatics (2025).

This has altered the performance of the model.

See the Latest Results page for the performance of both versions of Terrier on the test data.

Installation

Install using pip:

pip install bio-terrier

Or install the latest version from GitHub:

pip install git+https://github.com/rbturnbull/terrier.git

Google Colab Version

Follow this link to launch a Google Colab notebook where you can run the model on your own data: colab badge2

Usage

To run inference on a FASTA file, run this command:

terrier --input INPUT.fa --output-fasta OUTPUT.fa

That will add the classification to after the sequence ID in the OUTPUT.fa FASTA file.

If you want to save the probabilities for all classes run this:

terrier --input INPUT.fa --output-csv OUTPUT.csv

The columns will be the probability of each classification and the rows correspond to each sequence in INPUT.fa.

You can also use a URL as the input:

terrier --input https://example.com/INPUT.fasta.gz --output-fasta OUTPUT.fa

If you want to output a visualization of the prediction probabilities:

terrier --input INPUT.fa --image-dir OUTPUT-IMAGES/

The outputs for the above can be combined together. For more options run

terrier --help

To see the options to train the model, run:

terrier-tools --help

Programmatic Usage

You can also use the model programmatically:

from terrier import Terrier

terrier = Terrier()
terrier(file="INPUT.fa", output_fasta="OUTPUT.fa")

Potential Use Case

A potential workflow is to use RepeatModeler first to generate a repeat library. Then you can use Terrier to attempt to classify the remaining unknown repeats. If you only want highly confident classifications from Terrier, you can set the threshold to 0.9 or higher. If you wish to have more coverage, then you can set the threshold lower (or keep it at the default value of 0.7). The modified repeat library can then be used with RepeatMasker to mask the repeats in your genome assembly.

Credits

Terrier was developed by:

If you use this software, please cite the following preprint:

Robert Turnbull, Neil D. Young, Edoardo Tescari, Lee F. Skerratt, and Tiffany A. Kosch. (2025). ‘Terrier: A Deep Learning Repeat Classifier’. arXiv:2503.09312.

Wytamma Wirth set up Terrier as a tool at the Centre for Pathogen Genomics Portal at the University of Melbourne.

This command will generate a bibliography for the Terrier project.

terrier --bibliography

Here it is in BibTeX format:

@article{terier,
    author = {Turnbull, Robert and Young, Neil D and Tescari, Edoardo and Skerratt, Lee F and Kosch, Tiffany A},
    title = {Terrier: a deep learning repeat classifier},
    journal = {Briefings in Bioinformatics},
    volume = {26},
    number = {4},
    pages = {bbaf442},
    year = {2025},
    month = {08},
    abstract = {Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Poor representation of taxa within repeat databases often limits the classification accuracy and reproducibility of current repeat annotation methods, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on Repbase, which includes over 100,000 repeat families—four times more than Dfam—Terrier maps 97.1\% of Repbase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice, fruit flies, humans, and mice), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian, flatworm, and Northern krill genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.},
    issn = {1477-4054},
    doi = {10.1093/bib/bbaf442},
    url = {https://doi.org/10.1093/bib/bbaf442},
    eprint = {https://academic.oup.com/bib/article-pdf/26/4/bbaf442/64143069/bbaf442.pdf},
}

Run the following command to get the latest BibTeX entry:

terrier --bibtex

This will be updated with the final publication details when available.

Created using torchapp (https://github.com/rbturnbull/torchapp).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio_terrier-0.4.0.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bio_terrier-0.4.0-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file bio_terrier-0.4.0.tar.gz.

File metadata

  • Download URL: bio_terrier-0.4.0.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.1 Darwin/25.4.0

File hashes

Hashes for bio_terrier-0.4.0.tar.gz
Algorithm Hash digest
SHA256 8c460be1427d6c1e19f66b7a694ce13a54852bab1e762ff79100a531c294d3bc
MD5 f4d60ae18f28e7c34ea10062211e4b21
BLAKE2b-256 8ef59b9f88f87226086d79819f6318e3faa85155df2da3ae9a9004f9c7dfc735

See more details on using hashes here.

File details

Details for the file bio_terrier-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: bio_terrier-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.1 Darwin/25.4.0

File hashes

Hashes for bio_terrier-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 46cb012caf015b763035fcd20dec36c1d4a4c527959bf9c9e9169c765288ff4a
MD5 cfbc852bec71f7610b2cc84cb3438bc2
BLAKE2b-256 9ceb62b1e368f0ee86808511d3bbbe182d68b4610ccc6ec96b3419cec0d77ade

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page