Transformers for Transcripts
Project description
TRISTAN (TRanslational Identification Suite using Transformer Networks for ANalysis) is a suite of tools for detecting translated Open Reading Frames (ORFs) in organisms through the analysis of sequence context and/or ribosome profiling (Ribo-seq) data.
📚 Documentation
For complete installation instructions, user guides, and tutorials, please see our full documentation on ReadTheDocs.
👋 About the Project
TRISTAN tools are built using the newest advances and best practices in machine learning, stepping away from manually curated rules for data processing and instead letting the optimization algorithm handle it. The tools are designed with flexibility and modularity in mind.
Key design principles:
- Unbiased Data Utilization: Leverages the full transcriptome, allowing models to learn complex translational patterns directly from biological data without pre-imposed biases.
- Robust Model Validation: Separates training, validation, and test sets by chromosome to prevent information leakage and provide a more accurate assessment of performance.
- Data-Driven Decision Making: Machine learning models learn nuances intrinsically, avoiding hardcoded rules for data alteration or prediction adjustments.
- Seamless Downstream Integration: Generates various output file formats (CSV, GTF) designed for easy integration with common downstream analysis tools.
The transcript-transformer package incorporates the functionality of TIS Transformer and RiboTIE, using the Performer architecture to process transcripts at single-nucleotide resolution.
🛠️ Installation
First, install PyTorch with GPU support by following the instructions using the PyTorch manual.
Then, install TRISTAN:
pip install transcript_transformer
📖 Quick Start
TRISTAN can be used to detect translated ORFs using an input fasta file. See the User Documentation for more info.
Otherwise, TRISTAN is run from the command line, using a YAML configuration file to specify inputs and parameters.
-
Create a configuration file (
config.yml):# Path to genome annotation and sequence gtf_path : path/to/gtf_file.gtf fa_path : path/to/fa_file.fa # Path for the HDF5 database h5_path : my_experiment.h5 # Prefix for output files out_prefix: out/ # (Optional) Add ribosome profiling data for RiboTIE ribo_paths : SRR000001 : path/to/mapped/sample1.bam SRR000002 : path/to/mapped/sample2.bam
-
Run the tools:
-
TIS Transformer: Detect ORFs based on sequence context. Pre-trained models for human and mouse are available.
# Use a pre-trained model for human tis_transformer config.yml --model human
-
RiboTIE: Detect actively translated ORFs from Ribo-seq data.
# Fine-tune and predict from Ribo-seq samples ribotie config.yml
For more advanced usage, including training models from scratch and parallel processing, please refer to the full documentation.
-
🖊️ How to Cite
If you use TRISTAN in your research, please cite the relevant papers:
TIS Transformer:
Clauwaert, J., McVey, Z., Gupta, R., & Menschaert, G. (2023). TIS Transformer: remapping the human proteome using deep learning. NAR Genomics and Bioinformatics, 5(1), lqad021. https://doi.org/10.1093/nargab/lqad021
@article {10.1093/nargab/lqad021,
author = {Clauwaert, Jim and McVey, Zahra and Gupta, Ramneek and Menschaert, Gerben},
title = "{TIS Transformer: remapping the human proteome using deep learning}",
journal = {NAR Genomics and Bioinformatics},
volume = {5},
number = {1},
year = {2023},
month = {03},
issn = {2631-9268},
doi = {10.1093/nargab/lqad021}
}
RiboTIE:
Clauwaert, J., et al. (2025). Deep learning to decode sites of RNA translation in normal and cancerous tissues. Nature Communications, 16(1), 1275. https://doi.org/10.1038/s41467-025-56543-0
@article{clauwaert2025deep,
title={Deep learning to decode sites of RNA translation in normal and cancerous tissues},
author={Clauwaert, Jim and McVey, Zahra and Gupta, Ramneek and Yannuzzi, Ian and Basrur, Venkatesha and Nesvizhskii, Alexey I and Menschaert, Gerben and Prensner, John R},
journal={Nature Communications},
volume={16},
number={1},
pages={1275},
year={2025},
publisher={Nature Publishing Group UK London}
}
🤝 Contributing
Contributions are welcome! Please feel free to open an issue or submit a pull request.
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file transcript_transformer-1.1.1.tar.gz.
File metadata
- Download URL: transcript_transformer-1.1.1.tar.gz
- Upload date:
- Size: 128.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe910569755b32810db5f07fe0f255846c16c4ee8046f91568ebeb9e77f57b7b
|
|
| MD5 |
ec1aa389db909481a150390007773925
|
|
| BLAKE2b-256 |
60a234ffd44b8c9e650754964cb05e6a866e535d82f910bdd83079d5e6134898
|
File details
Details for the file transcript_transformer-1.1.1-py3-none-any.whl.
File metadata
- Download URL: transcript_transformer-1.1.1-py3-none-any.whl
- Upload date:
- Size: 128.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a25d60300432760778a52fcdf882f3ce65cd4584307394d47076989388eb3cef
|
|
| MD5 |
32ac563962046584a2e3bf44bd8eca25
|
|
| BLAKE2b-256 |
50463ff4c34d1a077b2ef1d4e22e66434ae92b3a0678451b1f2c34265d01afa1
|