Plug-and-play long context adaptation for diffusion language models
Project description
LongDLLM
🚀 Plug-and-play long context adaptation for diffusion language models
LongDLLM enables seamless extension of diffusion language models to handle long-context inputs (up to 128k tokens) with minimal code changes and a unified interface.
✨ Features
- 🎯 Drop-in compatibility: Works with existing code - just add one function call
- 🧠 Memory efficient: Handle 128k+ tokens on a single A6000 GPU (48GB VRAM)
- ⚡ Long Context Performance: Provide pre-tuned rescale factors for context extension
- 🔧 Unified interface: Same API for all supported models
🤖 Supported Models
- Apple DiffuCoder-7B-Instruct - Code generation with long context
- GSAI-ML LLaDA-8B-Instruct - General instruction following with extended context
📦 Installation
Basic Installation
pip install longdllm
Installing FlashAttention is highly recommended, you can install it separately via pip install flash-attn --no-build-isolation.
🚀 Quick Start
DiffuCoder Example
import torch
from transformers import AutoModel, AutoTokenizer
from longdllm import adapt_for_long_context
# 1. Load your model as usual
model = AutoModel.from_pretrained(
"apple/DiffuCoder-7B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
# 2. Adapt for long context (128k tokens)
model = adapt_for_long_context(model, target_length=131072)
# 3. Generate with long sequences
tokenizer = AutoTokenizer.from_pretrained("apple/DiffuCoder-7B-Instruct")
inputs = tokenizer("Your long prompt here...", return_tensors="pt")
output = model.diffusion_generate(
inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=256,
steps=32, # Diffusion steps
temperature=0.3,
top_p=0.95,
alg="entropy"
)
LLaDA Example
⚠️ LLaDA Note: Patched methods ignore
attention_biasfor memory efficiency. This is safe per LLaDA issue #90.
from transformers import AutoTokenizer, AutoModel
from longdllm import adapt_for_long_context
# 1. Load and adapt LLaDA model
model = AutoModel.from_pretrained("GSAI-ML/LLaDA-8B-Instruct", trust_remote_code=True)
model = adapt_for_long_context(model, target_length=131072)
# 2. Use unified diffusion_generate interface
tokenizer = AutoTokenizer.from_pretrained("GSAI-ML/LLaDA-8B-Instruct")
inputs = tokenizer("Your instruction here...", return_tensors="pt")
outputs = model.diffusion_generate(
input_ids=inputs.input_ids,
max_new_tokens=512,
temperature=0.0,
steps=128,
block_length=128,
remasking='low_confidence'
)
💡 Examples
Check out our example scripts to see LongDLLM in action:
examples/test_diffucoder.py- DiffuCoder passkey retrieval testexamples/test_llada.py- LLaDA passkey retrieval test
Running Examples
# Test DiffuCoder with 128k context
cd examples && python test_diffucoder.py
# Test LLaDA with 128k context
cd examples && python test_llada.py
Both examples demonstrate passkey retrieval - finding a hidden number in long documents, a common benchmark for long-context capabilities.
⚙️ Advanced Configuration
Custom Rescale Factors
Want to experiment? You can provide custom factors:
# Example: Exponential rescale factors (approximating optimized values)
import numpy as np
custom_factors = (
list(np.logspace(0, 1.5, 34)) + # 1.0 to ~31.6, exponentially spaced
list(np.linspace(16.3, 31.3, 30)) # Linear spacing for higher frequencies
)
model = adapt_for_long_context(
model,
target_length=65536, # Custom length
scaling_method='longrope',
rescale_factors=custom_factors
)
License
MIT
Citation
If you use LongDLLM in your research, please cite:
@misc{ge2025longcontext,
title = {Long-Context Extension for Language Diffusion Models up to 128k Tokens},
url = {https://albertge.notion.site/longcontext},
author = {Ge, Albert and Singh, Chandan and Zhang, Dinghuai and Peng, Letian and Zhuang, Yufan and Shang, Ning and Zhang, Li Lyna and Liu, Liyuan and Gao, Jianfeng},
journal = {Albert Ge's Notion},
year = {2025},
month = sep,
}
🤝 Support & Contributing
🐛 Issues & Questions
- GitHub Issues: Report bugs or ask questions
- Email: Albert Ge for direct support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file longdllm-0.1.5.tar.gz.
File metadata
- Download URL: longdllm-0.1.5.tar.gz
- Upload date:
- Size: 54.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1a8b5a583c5c892859fc290dfa67c79e5265cd46f5122144b86f10650e593ec
|
|
| MD5 |
ab39c8b196a28f431f75b1bbfbab6448
|
|
| BLAKE2b-256 |
6bf9964ea982d2d8e0eabd1622ecd1bb3d6a8a6ec4717ff457b7271df2d42a4b
|
File details
Details for the file longdllm-0.1.5-py3-none-any.whl.
File metadata
- Download URL: longdllm-0.1.5-py3-none-any.whl
- Upload date:
- Size: 38.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d18b3a275c0d943c1ea0080dde476de972534ed14828ed4454e52304e50676e
|
|
| MD5 |
b7fc6c639a4450211ba130deb5c5ce56
|
|
| BLAKE2b-256 |
f8131ab76f8a194c0519e811128e0755e25b8cd4112333da65fb4e8eb4d8b0e0
|