Plug-and-play long context adaptation for diffusion language models
Project description
LongDLLM
🚀 Plug-and-play long context adaptation for diffusion language models
LongDLLM enables easy adaptation of diffusion language models to support long-context inputs with minimal code changes and a unified interface. Currently supports:
- 🤖 Apple DiffuCoder-7B-Instruct
- 🤖 GSAI-ML LLaDA-8B-Instruct
Installation
pip install longdllm
Installing FlashAttention is recommended but not required, you can install it separately via pip install flash-attn --no-build-isolation.
Quick Start
Diffucoder Usage
import torch
from transformers import AutoModel
from longdllm import adapt_for_long_context
# Load your model as usual
model = AutoModel.from_pretrained(
"apple/DiffuCoder-7B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
# Adapt for long context (modifies model in-place and returns it)
model = adapt_for_long_context(model, target_length=131072)
# Use the adapted model with long sequences
output = model.diffusion_generate(
input_ids,
attention_mask=attention_mask,
max_new_tokens=256,
output_history=True,
return_dict_in_generate=True,
steps=256//8, # TOKEN_PER_STEP
temperature=0.3,
top_p=0.95,
alg="entropy",
alg_temp=0.,
)
LLaDA Usage (same interface for consistency):
from transformers import AutoTokenizer, AutoModelForCausalLM
from longdllm import adapt_for_long_context
# Load and adapt LLaDA model
model = AutoModelForCausalLM.from_pretrained("GSAI-ML/LLaDA-8B-Instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("GSAI-ML/LLaDA-8B-Instruct")
# Adapt for long context (patches forward methods and adds diffusion_generate interface)
model = adapt_for_long_context(model, target_length=131072)
# Use the same diffusion_generate interface as DiffuCoder
inputs = tokenizer("Your prompt here", return_tensors="pt")
outputs = model.diffusion_generate(
input_ids=inputs["input_ids"],
max_new_tokens=512,
temperature=0.0, # Gumbel noise temperature
steps=128,
block_length=128,
cfg_scale=0.0,
remasking='low_confidence'
)
Examples
Optimized Rescale Factors
LongDLLM includes optimized rescale factors based on LongRoPE2 for each supported model:
- DiffuCoder: Factors optimized for 131k context length through evolutionary search
- LLaDA: Factors optimized for 131k context length through evolutionary search
These factors are automatically selected based on model detection - no manual configuration needed. However, if you wish to try different rescale factors, you can pass in a separate list, like so:
Custom Configuration
# Custom rescale factors for LongRoPE
custom_factors = [1.0] * 32 + [0.8] * 16 + [0.6] * 16 # Example for 64-dim heads
model = adapt_for_long_context(
model,
target_length=32768,
scaling_method='longrope',
rescale_factors=custom_factors
)
Memory-Efficient Generation
LongDLLM automatically patches the generation code of both Diffucoder and LLaDA to be memory-efficient during long-context generation. It can handle more than 128k input tokens with less than 50GB of GPU memory.
API Reference
adapt_for_long_context(model, **kwargs)
Adapts a diffusion language model for long-context inputs by replacing RoPE embeddings.
⚠️ LLaDA Note: The patched forward methods ignore attention_bias for memory efficiency. This is safe according to LLaDA issue #90.
Parameters
model: The model to adapt (must be DiffuCoder or LLaDA)target_length(int, optional): Target sequence length.scaling_method(str, optional): RoPE scaling method ('longrope' or 'ntk'). Default: 'longrope'rescale_factors(list, optional): Custom rescale factors for LongRoPE. Uses optimized defaults if Nonemagnitude_scaling(str, optional): Magnitude scaling policy ('su' or 'yarn'). Default: 'su'
Returns
- The same model instance (modified in-place) for method chaining
Technical Details
LongDLLM uses the LongRoPE technique to extend the context length of pre-trained diffusion language models. It works by:
- Auto-detecting the model architecture (DiffuCoder or LLaDA)
- Replacing RoPE embeddings in each transformer layer with scaled versions
- Applying rescale factors optimized for long sequences
- Preserving all other model functionality
For memory-efficient generation,
- Uses sparse logits computation (only computes logits for necessary positions)
- Reduces peak GPU memory usage during generation
- Maintains full generation quality and compatibility
- Preserves the exact
diffusion_generate()interface - Works automatically - no additional configuration needed
License
MIT
Citation
If you use LongDLLM in your research, please cite:
@misc{ge2025longcontext,
title = {Long-Context Extension for Language Diffusion Models up to 128k Tokens},
url = {https://albertge.notion.site/longcontext},
author = {Ge, Albert and Singh, Chandan and Zhang, Dinghuai and Peng, Letian and Shang, Ning and Zhang, Li Lyna and Liu, Liyuan and Gao, Jianfeng},
journal = {Albert Ge's Notion},
year = {2025},
month = sep,
}
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Support
For questions and issues, please open an issue on GitHub, or reach out to me (Albert Ge).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file longdllm-0.1.3.tar.gz.
File metadata
- Download URL: longdllm-0.1.3.tar.gz
- Upload date:
- Size: 54.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ebf4efecec65920364ff8a05244fea796a2aa3804bafc270807133a2b44e338
|
|
| MD5 |
5808c91aadd75d3fc2cd277a2e1d8c1a
|
|
| BLAKE2b-256 |
1bc1dedb3b856bb1c4918cff53945055d9dd6de71bafac9fee6ead677f75dfdb
|
File details
Details for the file longdllm-0.1.3-py3-none-any.whl.
File metadata
- Download URL: longdllm-0.1.3-py3-none-any.whl
- Upload date:
- Size: 39.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12b45a99e323b0fd37b2edb6e03bee53bae76cfec2ae3bb1676da540edce1ad6
|
|
| MD5 |
2ae6d982fed5b56f680006151baca375
|
|
| BLAKE2b-256 |
fffdf9a05b68894c5c5e8fa921f93f6c3e551c8c1e57ca49694d4a2da34472db
|