Transposable Element Repeat Result classifIER
Project description
Transposable Element Repeat Result classifIER
Terrier is a Neural Network model to classify transposable element sequences.
It is based on ‘corgi’ which was trained to do hierarchical taxonomic classification of DNA sequences.
This model was trained using the Repbase library of repetitive DNA elements and trained to do hierarchical classification according to the RepeatMasker schema.
Installation
Install using pip:
pip install bio-terrier
Or install the latest version from GitHub:
pip install git+https://github.com/rbturnbull/terrier.git
Google Colab Version
Follow this link to launch a Google Colab notebook where you can run the model on your own data:
Usage
To run inference on a FASTA file, run this command:
terrier --file INPUT.fa --output-fasta OUTPUT.fa
That will add the classification to after the sequence ID in the OUTPUT.fa FASTA file.
If you want to save the probabilities for all classes run this:
terrier --file INPUT.fa --output-csv OUTPUT.csv
The columns will be the probability of each classification and the rows correspond to each sequence in INPUT.fa.
If you want to output a visualization of the prediction probabilities:
terrier --file INPUT.fa --image-dir OUTPUT-IMAGES/
The outputs for the above can be combined together. For more options run
terrier --help
To see the options to train the model, run:
terrier-tools --help
Programmatic Usage
You can also use the model programmatically:
from terrier import Terrier
terrier = Terrier()
terrier(file="INPUT.fa", output_fasta="OUTPUT.fa")
Potential Use Case
A potential workflow is to use RepeatModeler first to generate a repeat library. Then you can use Terrier to attempt to classify the remaining unknown repeats. If you only want highly confident classifications from Terrier, you can set the threshold to 0.9 or higher. If you wish to have more coverage, then you can set the threshold lower (or keep it at the default value of 0.7). The modified repeat library can then be used with RepeatMasker to mask the repeats in your genome assembly.
Credits
Terrier was developed by:
If you use this software, please cite the following preprint:
Robert Turnbull, Neil D. Young, Edoardo Tescari, Lee F. Skerratt, and Tiffany A. Kosch. (2025). ‘Terrier: A Deep Learning Repeat Classifier’. arXiv:2503.09312.
This command will generate a bibliography for the Terrier project.
terrier --bibliography
Here it is in BibTeX format:
@article{terrier,
title = {{Terrier: A Deep Learning Repeat Classifier}},
author = {Turnbull, Robert and Young, Neil D. and Tescari, Edoardo and Skerratt, Lee F. and Kosch, Tiffany A.},
year = {2025},
journal = {arXiv},
url = {https://arxiv.org/abs/2503.09312},
doi = {10.48550/arXiv.2503.09312}
}
Run the following command to get the latest BibTeX entry:
terrier --bibtex
This will be updated with the final publication details when available.
Created using torchapp (https://github.com/rbturnbull/torchapp).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bio_terrier-0.3.2.tar.gz.
File metadata
- Download URL: bio_terrier-0.3.2.tar.gz
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.1 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1141769264310d300e8e1b3850893928f9be598d2d02dbcfe109f695d4c4c3ac
|
|
| MD5 |
59e49022cb2bf14e1e62e88d292d0a10
|
|
| BLAKE2b-256 |
6d717e033061e950df5d7868a0cf540be52f6b5113b74b980be3e77af9c04663
|
File details
Details for the file bio_terrier-0.3.2-py3-none-any.whl.
File metadata
- Download URL: bio_terrier-0.3.2-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.1 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c99c6c5b459879ab98ec71e4fdefd048710765bf30de44d467ef34b3de686cc7
|
|
| MD5 |
16e24bf53d53b9c658fa536d7096f0f9
|
|
| BLAKE2b-256 |
a2ff3f02e2acf05725b57ebceab1696971730b9f89e4af4a7712cabf75b6f7a9
|