Transformer of Epigenetics to Chromatin Structural AnnotationS
Project description
TECSAS
Transformer of Epigenetics to Chromatin Structural AnnotationS
Documentation | Tutorials | Installation
Overview
TECSAS (Transformer of Epigenetics to Chromatin Structural AnnotationS) is a deep learning model based on the Transformer architecture designed to predict chromatin subcompartment annotations directly from epigenomic data. TECSAS leverages information from histone modifications, transcription factor binding profiles, and RNA-seq data to decode the relationship between the biochemical composition of chromatin and its 3D structural behavior.
Chromatin within the nucleus adopts complex three-dimensional structures that are crucial for gene regulation and cellular function. Recent studies have revealed the presence of distinct chromatin subcompartments beyond the traditional A/B compartments (eu- and hetero-chromatin), each exhibiting unique structural and functional properties. TECSAS achieves high accuracy in predicting subcompartment annotations and reveals the influence of long-range epigenomic context on chromatin organization.
The framework enables:
- Chromatin subcompartment prediction: Classification of genomic regions into subcompartments (A1, A2, B1, B2, B3) at 25-50kb resolution
- Nuclear body association prediction: Identification of lamina-associated domains (LADs), nucleolus-associated domains (NADs), and nuclear speckle-associated domains (SPADs)
- Transfer learning: Pre-trained encoder on reference cell lines (e.g., GM12878) can be fine-tuned for target cell lines
TECSAS processes epigenomic signal tracks at specified genomic resolution (default 50kb bins), normalizes signals using z-score standardization, and uses sliding window context (default ±14 neighboring bins) to capture spatial dependencies. Unlike methods that rely on Hi-C contact maps, TECSAS predicts 3D genome organization directly from the epigenome, enabling analysis across diverse cell types without requiring proximity ligation experiments.
Usage
For complete examples, see the Tutorials directory.
Resources
- Tutorials: Step-by-step notebooks in the Tutorials/ directory
Load_model_GM12878_155exp_50kbp.ipynb: Load and use pre-trained GM12878 subcompartment modelTest_GM12878_155exp_50kbp.ipynb: Evaluate the pre-trained GM12878 model with per-class accuracy and confusion matricesLoad_model_K562_124_exp_25kbp.ipynb: Load K562 model at 25kb resolutiontrain_and_predict_HistMod_example.ipynb: Training workflow using histone modificationstrain_and_predict_XADS_HistMod_RNASeq.ipynb: Complete workflow for nuclear body association (LADs/NADs/SPADs) prediction using transfer learning
- Pre-trained models: Model weights in TECSAS/share/models/
bv_GM12878_155.pt: GM12878 model trained with 155 experiments at 50kbp resolution (75.8% overall accuracy)
- Reference data: Subcompartment annotations and nuclear body association labels (LADs, NADs, SPADs) in TECSAS/share/
Installation
Requirements
TECSAS requires Python 3.6+ and the following dependencies:
- PyTorch (>=1.7.0)
- NumPy (>=1.18)
- pyBigWig
- requests
- joblib
- tqdm
- urllib3
Install from PyPI
pip install TECSAS
Install from source
Clone the repository and install:
git clone https://github.com/ed29rice/TECSAS.git
cd TECSAS
pip install -e .
Install dependencies
pip install torch numpy pyBigWig requests joblib tqdm urllib3
Note: For GPU acceleration, ensure you have CUDA-compatible PyTorch installed
Quick Start
Option A: Use pre-trained weights
Pre-trained model weights for GM12878 (155 experiments, 50kbp resolution) are included in TECSAS/share/models/. You can load and use them directly without retraining:
import torch
from TECSAS import TECSAS
# Model configuration matching the pre-trained weights
n_neighbors = 14 # Neighboring bins on each side (context window)
n_predict = 3 # Number of loci to predict
NEXP = 155 # Number of experiments in GM12878
nfeatures = NEXP * (2 * n_neighbors + 1) # 155 * 29 = 4495
model = TECSAS(n_predict, emsize=128, nhead=8, d_hid=64, nlayers=2,
nfeatures=nfeatures, ostates=5, dropout=0.01)
# Load pre-trained weights (keys have a 'module.' prefix from DataParallel)
state = torch.load('TECSAS/share/models/bv_GM12878_155.pt', map_location='cpu')
model.load_state_dict({'.'.join(k.split('.')[1:]): v for k, v in state.items()})
model.eval()
See Tutorials/Load_model_GM12878_155exp_50kbp.ipynb for a complete evaluation example.
Option B: Train from scratch
If you want to retrain the model on your own data or a different cell line:
-
Import TECSAS:
from TECSAS import data_process, TECSAS
-
Download and process epigenomic data from ENCODE:
dp = data_process(cell_line='GM12878', assembly='hg19', histones=True, tf=True) dp.download_and_process_cell_line_data(nproc=10) dp.download_and_process_ref_data(nproc=10)
-
Generate training data:
train, val, test, averages, indices = dp.training_data(n_neigbors=14, train_per=0.8)
-
Initialize and train the model:
model = TECSAS(n_predict=3, emsize=128, nhead=8, d_hid=64, nlayers=2, nfeatures=NEXP*(2*14+1), ostates=5, dropout=0.01) # ... training loop (see Tutorials/train_and_predict_HistMod_example.ipynb)
-
Make predictions on a target cell line:
test_data = dp.test_set(chr=1) predictions = model(test_data, None)[0].argmax(dim=-1)
See the Tutorials/ directory for complete training and prediction workflows.
Citation
If you use TECSAS in your research, please cite:
Dodero-Rojas, E., Mendieta, A., Fehlis, Y., Mayala, N., Contessoto, V. G., & Onuchic, J. N. (2025). Epigenetics is all you need: A transformer to decode chromatin structural compartments from the epigenome. PLOS Computational Biology, 21(12), e1012326. https://doi.org/10.1371/journal.pcbi.1012326
License
TECSAS is released under the MIT License. See LICENSE for details.
Acknowledgments
This research was supported by the Center for Theoretical Biological Physics, sponsored by the NSF (Grants PHY-2019745 and PHY-2210291) and by the Welch Foundation (Grant C-1792). We thank AMD (Advanced Micro Devices, Inc.) for the donation of critical hardware and support resources from its HPC Fund that made this work possible.
Contact
For questions, issues, or collaborations, please open an issue on GitHub or contact the developers.
Copyright (c) 2020-2025 The Center for Theoretical Biological Physics (CTBP) - Rice University
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tecsas-1.0.1.tar.gz.
File metadata
- Download URL: tecsas-1.0.1.tar.gz
- Upload date:
- Size: 64.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0c9c69227a78c00f97aca595a3b94eb1235df8f9285664d56b4c98206d25668
|
|
| MD5 |
837edf9158c0e199bef01d47c8ead718
|
|
| BLAKE2b-256 |
f9841d7afb3ba1239f2370727e16c2acdf9a51700a227f68d7a1118f20bbfddf
|
File details
Details for the file tecsas-1.0.1-py3-none-any.whl.
File metadata
- Download URL: tecsas-1.0.1-py3-none-any.whl
- Upload date:
- Size: 65.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
abb12b2c42ace302a0d48e0375fdf3134fe11a0d6d1b46891b6251ebaf038144
|
|
| MD5 |
32cb52761bdb175fb374e4db6943d066
|
|
| BLAKE2b-256 |
9f01069bdf0c05234caba984db82abec662e1af7a009d9500049fdfc2c0c6270
|