Plug-and-play long context adaptation for diffusion language models

These details have not been verified by PyPI

Project links

Project description

LongDLLM

🚀 Plug-and-play long context adaptation for diffusion language models

LongDLLM enables easy adaptation of diffusion language models to support long-context inputs with minimal code changes and a unified interface. Currently supports:

🤖 Apple DiffuCoder-7B-Instruct
🤖 GSAI-ML LLaDA-8B-Instruct

Installation

pip install longdllm

Installing FlashAttention is recommended but not required, you can install it separately via pip install flash-attn --no-build-isolation.

Quick Start

Diffucoder Usage

import torch
from transformers import AutoModel
from longdllm import adapt_for_long_context

# Load your model as usual
model = AutoModel.from_pretrained(
    "apple/DiffuCoder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Adapt for long context (modifies model in-place and returns it)
model = adapt_for_long_context(model, target_length=131072)

# Use the adapted model with long sequences
output = model.diffusion_generate(
    input_ids,
    attention_mask=attention_mask,
    max_new_tokens=256,
    output_history=True,
    return_dict_in_generate=True,
    steps=256//8,  # TOKEN_PER_STEP
    temperature=0.3,
    top_p=0.95,
    alg="entropy",
    alg_temp=0.,
)

LLaDA Usage (same interface for consistency):

from transformers import AutoTokenizer, AutoModelForCausalLM
from longdllm import adapt_for_long_context

# Load and adapt LLaDA model
model = AutoModelForCausalLM.from_pretrained("GSAI-ML/LLaDA-8B-Instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("GSAI-ML/LLaDA-8B-Instruct")

# Adapt for long context (patches forward methods and adds diffusion_generate interface)
model = adapt_for_long_context(model, target_length=131072)

# Use the same diffusion_generate interface as DiffuCoder
inputs = tokenizer("Your prompt here", return_tensors="pt")
outputs = model.diffusion_generate(
    input_ids=inputs["input_ids"],
    max_new_tokens=512,
    temperature=0.0,  # Gumbel noise temperature
    steps=128,
    block_length=128,
    cfg_scale=0.0,
    remasking='low_confidence'
)

Examples

Optimized Rescale Factors

LongDLLM includes optimized rescale factors based on LongRoPE2 for each supported model:

DiffuCoder: Factors optimized for 131k context length through evolutionary search
LLaDA: Factors optimized for 131k context length through evolutionary search

These factors are automatically selected based on model detection - no manual configuration needed. However, if you wish to try different rescale factors, you can pass in a separate list, like so:

Custom Configuration

# Custom rescale factors for LongRoPE
custom_factors = [1.0] * 32 + [0.8] * 16 + [0.6] * 16  # Example for 64-dim heads
model = adapt_for_long_context(
    model,
    target_length=32768,
    scaling_method='longrope',
    rescale_factors=custom_factors
)

Memory-Efficient Generation

LongDLLM automatically patches the generation code of both Diffucoder and LLaDA to be memory-efficient during long-context generation. It can handle more than 128k input tokens with less than 50GB of GPU memory.

API Reference

`adapt_for_long_context(model, **kwargs)`

Adapts a diffusion language model for long-context inputs by replacing RoPE embeddings.

⚠️ LLaDA Note: The patched forward methods ignore attention_bias for memory efficiency. This is safe according to LLaDA issue #90.

Parameters

model: The model to adapt (must be DiffuCoder or LLaDA)
target_length (int, optional): Target sequence length.
scaling_method (str, optional): RoPE scaling method ('longrope' or 'ntk'). Default: 'longrope'
rescale_factors (list, optional): Custom rescale factors for LongRoPE. Uses optimized defaults if None
magnitude_scaling (str, optional): Magnitude scaling policy ('su' or 'yarn'). Default: 'su'

Returns

The same model instance (modified in-place) for method chaining

Technical Details

LongDLLM uses the LongRoPE technique to extend the context length of pre-trained diffusion language models. It works by:

Auto-detecting the model architecture (DiffuCoder or LLaDA)
Replacing RoPE embeddings in each transformer layer with scaled versions
Applying rescale factors optimized for long sequences
Preserving all other model functionality

For memory-efficient generation,

Uses sparse logits computation (only computes logits for necessary positions)
Reduces peak GPU memory usage during generation
Maintains full generation quality and compatibility
Preserves the exact diffusion_generate() interface
Works automatically - no additional configuration needed

License

MIT

Citation

If you use LongDLLM in your research, please cite:

@misc{ge2025longcontext,
  title = {Long-Context Extension for Language Diffusion Models up to 128k Tokens},
  url = {https://albertge.notion.site/longcontext},
  author = {Ge, Albert and Singh, Chandan and Zhang, Dinghuai and Peng, Letian and Shang, Ning and Zhang, Li Lyna and Liu, Liyuan and Gao, Jianfeng},
  journal = {Albert Ge's Notion},
  year = {2025},
  month = sep,
}

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Support

For questions and issues, please open an issue on GitHub, or reach out to me (Albert Ge).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.5

Sep 17, 2025

0.1.4

Sep 13, 2025

This version

0.1.3

Sep 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

longdllm-0.1.3.tar.gz (54.4 kB view details)

Uploaded Sep 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

longdllm-0.1.3-py3-none-any.whl (39.4 kB view details)

Uploaded Sep 12, 2025 Python 3

File details

Details for the file longdllm-0.1.3.tar.gz.

File metadata

Download URL: longdllm-0.1.3.tar.gz
Upload date: Sep 12, 2025
Size: 54.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for longdllm-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`4ebf4efecec65920364ff8a05244fea796a2aa3804bafc270807133a2b44e338`
MD5	`5808c91aadd75d3fc2cd277a2e1d8c1a`
BLAKE2b-256	`1bc1dedb3b856bb1c4918cff53945055d9dd6de71bafac9fee6ead677f75dfdb`

See more details on using hashes here.

File details

Details for the file longdllm-0.1.3-py3-none-any.whl.

File metadata

Download URL: longdllm-0.1.3-py3-none-any.whl
Upload date: Sep 12, 2025
Size: 39.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for longdllm-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`12b45a99e323b0fd37b2edb6e03bee53bae76cfec2ae3bb1676da540edce1ad6`
MD5	`2ae6d982fed5b56f680006151baca375`
BLAKE2b-256	`fffdf9a05b68894c5c5e8fa921f93f6c3e551c8c1e57ca49694d4a2da34472db`

See more details on using hashes here.

longdllm 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LongDLLM

Installation

Quick Start

Examples

Optimized Rescale Factors

Custom Configuration

Memory-Efficient Generation

API Reference

adapt_for_long_context(model, **kwargs)

Parameters

Returns

Technical Details

License

Citation

Contributing

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`adapt_for_long_context(model, **kwargs)`