Skip to main content

Transformers for Transcripts

Project description

TRISTAN

Driving coding sequence discovery since 2023

PyPi Version GitHub license GitHub issues GitHub stars Documentation Status

TRISTAN (TRanslational Identification Suite using Transformer Networks for ANalysis) is a suite of tools for detecting translated Open Reading Frames (ORFs) in organisms through the analysis of sequence context and/or ribosome profiling (Ribo-seq) data.


📚 Documentation

For complete installation instructions, user guides, and tutorials, please see our full documentation on ReadTheDocs.


👋 About the Project

TRISTAN tools are built using the newest advances and best practices in machine learning, stepping away from manually curated rules for data processing and instead letting the optimization algorithm handle it. The tools are designed with flexibility and modularity in mind.

Key design principles:

  • Unbiased Data Utilization: Leverages the full transcriptome, allowing models to learn complex translational patterns directly from biological data without pre-imposed biases.
  • Robust Model Validation: Separates training, validation, and test sets by chromosome to prevent information leakage and provide a more accurate assessment of performance.
  • Data-Driven Decision Making: Machine learning models learn nuances intrinsically, avoiding hardcoded rules for data alteration or prediction adjustments.
  • Seamless Downstream Integration: Generates various output file formats (CSV, GTF) designed for easy integration with common downstream analysis tools.

The transcript-transformer package incorporates the functionality of TIS Transformer and RiboTIE, using the Performer architecture to process transcripts at single-nucleotide resolution.

🛠️ Installation

First, install PyTorch with GPU support by following the instructions using the PyTorch manual.

Then, install TRISTAN:

pip install transcript_transformer

📖 Quick Start

TRISTAN can be used to detect translated ORFs using an input fasta file. See the User Documentation for more info.

Otherwise, TRISTAN is run from the command line, using a YAML configuration file to specify inputs and parameters.

  1. Create a configuration file (config.yml):

    # Path to genome annotation and sequence
    gtf_path : path/to/gtf_file.gtf
    fa_path : path/to/fa_file.fa
    
    # Path for the HDF5 database
    h5_path : my_experiment.h5
    
    # Prefix for output files
    out_prefix: out/
    
    # (Optional) Add ribosome profiling data for RiboTIE
    ribo_paths :
      SRR000001 : path/to/mapped/sample1.bam
      SRR000002 : path/to/mapped/sample2.bam
    
  2. Run the tools:

    • TIS Transformer: Detect ORFs based on sequence context. Pre-trained models for human and mouse are available.

      # Use a pre-trained model for human
      tis_transformer config.yml --model human
      
    • RiboTIE: Detect actively translated ORFs from Ribo-seq data.

      # Fine-tune and predict from Ribo-seq samples
      ribotie config.yml
      

    For more advanced usage, including training models from scratch and parallel processing, please refer to the full documentation.

🖊️ How to Cite

If you use TRISTAN in your research, please cite the relevant papers:

TIS Transformer:

Clauwaert, J., McVey, Z., Gupta, R., & Menschaert, G. (2023). TIS Transformer: remapping the human proteome using deep learning. NAR Genomics and Bioinformatics, 5(1), lqad021. https://doi.org/10.1093/nargab/lqad021

@article {10.1093/nargab/lqad021,
    author = {Clauwaert, Jim and McVey, Zahra and Gupta, Ramneek and Menschaert, Gerben},
    title = "{TIS Transformer: remapping the human proteome using deep learning}",
    journal = {NAR Genomics and Bioinformatics},
    volume = {5},
    number = {1},
    year = {2023},
    month = {03},
    issn = {2631-9268},
    doi = {10.1093/nargab/lqad021}
}

RiboTIE:

Clauwaert, J., et al. (2025). Deep learning to decode sites of RNA translation in normal and cancerous tissues. Nature Communications, 16(1), 1275. https://doi.org/10.1038/s41467-025-56543-0

@article{clauwaert2025deep,
  title={Deep learning to decode sites of RNA translation in normal and cancerous tissues},
  author={Clauwaert, Jim and McVey, Zahra and Gupta, Ramneek and Yannuzzi, Ian and Basrur, Venkatesha and Nesvizhskii, Alexey I and Menschaert, Gerben and Prensner, John R},
  journal={Nature Communications},
  volume={16},
  number={1},
  pages={1275},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

🤝 Contributing

Contributions are welcome! Please feel free to open an issue or submit a pull request.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transcript_transformer-1.1.1.tar.gz (128.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transcript_transformer-1.1.1-py3-none-any.whl (128.2 MB view details)

Uploaded Python 3

File details

Details for the file transcript_transformer-1.1.1.tar.gz.

File metadata

  • Download URL: transcript_transformer-1.1.1.tar.gz
  • Upload date:
  • Size: 128.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for transcript_transformer-1.1.1.tar.gz
Algorithm Hash digest
SHA256 fe910569755b32810db5f07fe0f255846c16c4ee8046f91568ebeb9e77f57b7b
MD5 ec1aa389db909481a150390007773925
BLAKE2b-256 60a234ffd44b8c9e650754964cb05e6a866e535d82f910bdd83079d5e6134898

See more details on using hashes here.

File details

Details for the file transcript_transformer-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for transcript_transformer-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a25d60300432760778a52fcdf882f3ce65cd4584307394d47076989388eb3cef
MD5 32ac563962046584a2e3bf44bd8eca25
BLAKE2b-256 50463ff4c34d1a077b2ef1d4e22e66434ae92b3a0678451b1f2c34265d01afa1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page