Skip to main content

Plug-and-play long context adaptation for diffusion language models

Project description

LongDLLM

🚀 Plug-and-play long context adaptation for diffusion language models

LongDLLM enables easy adaptation of diffusion language models to support long-context inputs with minimal code changes and a unified interface. Currently supports:

  • 🤖 Apple DiffuCoder-7B-Instruct
  • 🤖 GSAI-ML LLaDA-8B-Instruct

Installation

pip install longdllm

Installing FlashAttention is recommended but not required, you can install it separately via pip install flash-attn --no-build-isolation.

Quick Start

Diffucoder Usage

import torch
from transformers import AutoModel
from longdllm import adapt_for_long_context

# Load your model as usual
model = AutoModel.from_pretrained(
    "apple/DiffuCoder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Adapt for long context (modifies model in-place and returns it)
model = adapt_for_long_context(model, target_length=131072)

# Use the adapted model with long sequences
output = model.diffusion_generate(
    input_ids,
    attention_mask=attention_mask,
    max_new_tokens=256,
    output_history=True,
    return_dict_in_generate=True,
    steps=256//8,  # TOKEN_PER_STEP
    temperature=0.3,
    top_p=0.95,
    alg="entropy",
    alg_temp=0.,
)

LLaDA Usage (same interface for consistency):

from transformers import AutoTokenizer, AutoModelForCausalLM
from longdllm import adapt_for_long_context

# Load and adapt LLaDA model
model = AutoModelForCausalLM.from_pretrained("GSAI-ML/LLaDA-8B-Instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("GSAI-ML/LLaDA-8B-Instruct")

# Adapt for long context (patches forward methods and adds diffusion_generate interface)
model = adapt_for_long_context(model, target_length=131072)

# Use the same diffusion_generate interface as DiffuCoder
inputs = tokenizer("Your prompt here", return_tensors="pt")
outputs = model.diffusion_generate(
    input_ids=inputs["input_ids"],
    max_new_tokens=512,
    temperature=0.0,  # Gumbel noise temperature
    steps=128,
    block_length=128,
    cfg_scale=0.0,
    remasking='low_confidence'
)

Examples

Optimized Rescale Factors

LongDLLM includes optimized rescale factors based on LongRoPE2 for each supported model:

  • DiffuCoder: Factors optimized for 131k context length through evolutionary search
  • LLaDA: Factors optimized for 131k context length through evolutionary search

These factors are automatically selected based on model detection - no manual configuration needed. However, if you wish to try different rescale factors, you can pass in a separate list, like so:

Custom Configuration

# Custom rescale factors for LongRoPE
custom_factors = [1.0] * 32 + [0.8] * 16 + [0.6] * 16  # Example for 64-dim heads
model = adapt_for_long_context(
    model,
    target_length=32768,
    scaling_method='longrope',
    rescale_factors=custom_factors
)

Memory-Efficient Generation

LongDLLM automatically patches the generation code of both Diffucoder and LLaDA to be memory-efficient during long-context generation. It can handle more than 128k input tokens with less than 50GB of GPU memory.

API Reference

adapt_for_long_context(model, **kwargs)

Adapts a diffusion language model for long-context inputs by replacing RoPE embeddings.

⚠️ LLaDA Note: The patched forward methods ignore attention_bias for memory efficiency. This is safe according to LLaDA issue #90.

Parameters

  • model: The model to adapt (must be DiffuCoder or LLaDA)
  • target_length (int, optional): Target sequence length.
  • scaling_method (str, optional): RoPE scaling method ('longrope' or 'ntk'). Default: 'longrope'
  • rescale_factors (list, optional): Custom rescale factors for LongRoPE. Uses optimized defaults if None
  • magnitude_scaling (str, optional): Magnitude scaling policy ('su' or 'yarn'). Default: 'su'

Returns

  • The same model instance (modified in-place) for method chaining

Technical Details

LongDLLM uses the LongRoPE technique to extend the context length of pre-trained diffusion language models. It works by:

  1. Auto-detecting the model architecture (DiffuCoder or LLaDA)
  2. Replacing RoPE embeddings in each transformer layer with scaled versions
  3. Applying rescale factors optimized for long sequences
  4. Preserving all other model functionality

For memory-efficient generation,

  • Uses sparse logits computation (only computes logits for necessary positions)
  • Reduces peak GPU memory usage during generation
  • Maintains full generation quality and compatibility
  • Preserves the exact diffusion_generate() interface
  • Works automatically - no additional configuration needed

License

MIT

Citation

If you use LongDLLM in your research, please cite:

@misc{ge2025longcontext,
  title = {Long-Context Extension for Language Diffusion Models up to 128k Tokens},
  url = {https://albertge.notion.site/longcontext},
  author = {Ge, Albert and Singh, Chandan and Zhang, Dinghuai and Peng, Letian and Shang, Ning and Zhang, Li Lyna and Liu, Liyuan and Gao, Jianfeng},
  journal = {Albert Ge's Notion},
  year = {2025},
  month = sep,
}

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Support

For questions and issues, please open an issue on GitHub, or reach out to me (Albert Ge).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

longdllm-0.1.3.tar.gz (54.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

longdllm-0.1.3-py3-none-any.whl (39.4 kB view details)

Uploaded Python 3

File details

Details for the file longdllm-0.1.3.tar.gz.

File metadata

  • Download URL: longdllm-0.1.3.tar.gz
  • Upload date:
  • Size: 54.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for longdllm-0.1.3.tar.gz
Algorithm Hash digest
SHA256 4ebf4efecec65920364ff8a05244fea796a2aa3804bafc270807133a2b44e338
MD5 5808c91aadd75d3fc2cd277a2e1d8c1a
BLAKE2b-256 1bc1dedb3b856bb1c4918cff53945055d9dd6de71bafac9fee6ead677f75dfdb

See more details on using hashes here.

File details

Details for the file longdllm-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: longdllm-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 39.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for longdllm-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 12b45a99e323b0fd37b2edb6e03bee53bae76cfec2ae3bb1676da540edce1ad6
MD5 2ae6d982fed5b56f680006151baca375
BLAKE2b-256 fffdf9a05b68894c5c5e8fa921f93f6c3e551c8c1e57ca49694d4a2da34472db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page