TPTT : Transforming Pretrained Transformers into Titans
Project description
😊 TPTT
Transforming Pretrained Transformers into Titans
TPTT is a modular Python library designed to inject efficient linearized attention (LiZA) mechanisms-such as Memory as Gate (described in Titans)-into pretrained transformers 🤗.
Features
- Flexible Attention Injection: Seamlessly wrap and augment standard Transformer attention layers with linearized attention variants for latent memory.
- Support for Linear Attention: Includes implementations of DeltaNet and DeltaProduct with optional recurrent nonlinearity between chunks.
- Modular Design: Easily extend or customize operators and integration strategies.
- Compatibility: Designed to integrate with Hugging Face Transformers and similar PyTorch models.
- Low-Compute Alignment: Requires only lightweight fine-tuning after injection, enabling efficient memory integration without heavy retraining.
[!IMPORTANT] After injecting the LiZA module, the model requires fine-tuning to properly align and effectively utilize the memory mechanism.
Note: The Order 2
Delta-Productattention mechanism is equally expressive as Titans.
Installation and Usage
pip install tptt
Titanesque Documentation
-
TPTT-LiZA_Training:
Instructions for training TPTT-based models with LoRA and advanced memory management. -
TPTT_LiZA_Evaluation:
Guide for evaluating language models with LightEval and Hugging Face Transformers. -
TPTT_LiZA_FromScratch:
Integrating theLinearAttentionmodule into Pytorch deep learning projects.
Basic usage :
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
import tptt
from tptt import save_tptt_safetensors, get_tptt_model, load_tptt_safetensors
from torch import nn
##### Transforming into Titans (Tptt)
base_model_path = "Qwen/Qwen2.5-1.5B"
base_config = AutoConfig.from_pretrained(base_model_path)
base_model_name = "Qwen/Qwen2.5-1.5B"
tptt_config = tptt.TpttConfig(
base_model_config=base_config,
base_model_name= base_model_name,
#lora_config=lora_config,
)
model = tptt.TpttModel(config)
# manual local save
save_tptt_safetensors(model, path, name)
##### Pretrained Titans from Transformer
repo_id = "ffurfaro/Titans-Llama-3.2-1B"
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
##### More custom for other Model (BERT, ViT, etc.)
model, linear_cache = get_tptt_model(model, config) # you can activate Bidirectional
model = load_tptt_safetensors(repo_or_path, model) # from saved LoRA only
##### Using LinearAttention from scratch
layers = nn.ModuleList([
tptt.LinearAttention(hidden_dim=64, num_heads=4,)
for _ in range(num_layers)])
Some scripts are available here
Development
- Code is organized into modular components under the
src/tpttdirectory. - Use
pytestfor testing andsphinxfor documentation. See on this link🔥 - Contributions and feature requests are welcome!
Requirements
- Python 3.11+
- PyTorch
- einops
- Transformers
- Peft
See requirements.txt for the full list.
Docker Usage
Build and run TPTT with Docker:
# Build the image
docker build -t tptt .
# Run training (with GPU support)
docker run -it --gpus all \
-v $(pwd)/data:/data \
-v $(pwd)/outputs:/outputs \
tptt python -m train \
--model_name "meta-llama/Llama-3.2-1B" \
--method delta_rule \
--mag_weight 0.5
For more details, see the Dockerfile.
Acknowledgements
Discovering the OpenSparseLLMs/Linearization (🚀 linear-flash-attention-based) project inspired this work and motivated me to create a fully modular, Delta-rule style PyTorch version.
Citation
If you use TPTT in your academic work, please cite:
@article{furfaro2025tptt,
title={TPTT: Transforming Pretrained Transformers into Titans},
author={Furfaro, Fabien},
journal={arXiv preprint arXiv:2506.17671},
year={2025}
}
Contact
For questions or support, please open an issue on the GitHub repository or contact the maintainer.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tptt-0.11.7.tar.gz.
File metadata
- Download URL: tptt-0.11.7.tar.gz
- Upload date:
- Size: 28.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.11 Linux/6.8.0-79-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d20f0d2e13473763bccc5967ae73231ab0c957b3e8ef974a477062b9d039de33
|
|
| MD5 |
adbdf3f01a97e0b7ccdd3e41c7c2b8ae
|
|
| BLAKE2b-256 |
eb4acffbb68755933085157d3e3dba7f4e661bdf643855f61a6734a3658510b5
|
File details
Details for the file tptt-0.11.7-py3-none-any.whl.
File metadata
- Download URL: tptt-0.11.7-py3-none-any.whl
- Upload date:
- Size: 30.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.11 Linux/6.8.0-79-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40d85fb5643ef98ae0169cbd5121fc3962b80c41f7f1d34a2f8e77dab72aad32
|
|
| MD5 |
ae40100ba8a1300950f2a48945b4fcb1
|
|
| BLAKE2b-256 |
8fbdd75f8e94351c1a5b03929763ba45e8e52e66bcbb47ea208d6872e70fb3ad
|