Skip to main content

ShuffleLM: Parallel Token Generation Language Models with Intelligent Reordering

Project description

๐ŸŽฏ ShuffleLM

License: MIT Python 3.9+ PyPI version

ShuffleLM is an innovative language model architecture that implements parallel token generation with intelligent reordering. Unlike traditional autoregressive generation, ShuffleLM generates multiple tokens simultaneously and then intelligently reorders and filters them for faster and more efficient text generation.

๐Ÿ”ฌ Academic Background

Foundation Research for Parallel Generation

Non-Autoregressive Neural Machine Translation:

  • Gu et al. (2018) - "Non-Autoregressive Neural Machine Translation" - Introduced fertility-based parallel decoding
  • Lee et al. (2018) - "Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement" - Iterative refinement approach
  • Ghazvininejad et al. (2019) - "Mask-Predict: Parallel Decoding of Conditional Masked Language Models" - BERT-style masking with iterative prediction

Latent Variable Models:

  • Kaiser et al. (2018) - "Fast Decoding in Sequence Models using Discrete Latent Variables" - Discrete latent variable compression
  • Ma et al. (2019) - "FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow" - Normalizing flow for latent modeling

MLP-Mixer and Position Encoding:

  • Tolstikhin et al. (2021) - "MLP-Mixer: An all-MLP Architecture for Vision" - Original MLP-Mixer architecture
  • Su et al. (2021) - "RoFormer: Enhanced Transformer with Rotary Position Embedding" - Rotary Position Embedding (RoPE)

Non-Autoregressive Advances (2020-2022):

  • Zhou et al. (2020) - "Understanding Knowledge Distillation in Non-autoregressive Machine Translation" - Knowledge distillation for NAT
  • Qian et al. (2021) - "Glancing Transformer for Non-Autoregressive Neural Machine Translation" - Semi-autoregressive approaches
  • Ding et al. (2022) - "StraighTformer: Decoupled Attention with Linear Complexity for Fast Non-Autoregressive Generation"

Speculative Decoding and Parallel Generation (2023-2024):

  • Leviathan et al. (2023) - "Fast Inference from Transformers via Speculative Decoding" - Draft-then-verify approach for acceleration
  • Cai et al. (2024) - "Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads" - Multiple draft heads for parallel speculation
  • Spector & Re (2023) - "Accelerating Large Language Model Decoding with Speculative Sampling" - Probability-based speculative sampling
  • Sun et al. (2024) - "SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference"

Model Evolution Paradigms

The evolution of language models can be broadly categorized into three paradigms:

Causal Language Models (Autoregressive Language Models)

  • GPT Series: Sequential token generation from left to right
  • Advantages: Stable and consistent generation
  • Disadvantages: Sequential processing leads to speed limitations

Diffusion Language Models

  • BERT-based: Gradually restore masked tokens through iterative refinement
  • Advantages: Bidirectional context utilization
  • Disadvantages: Complex noise scheduling and multi-step processing

Shuffle Language Models โญ New

  • ShuffleLM: Parallel generation followed by intelligent reordering
  • Advantages: Fast parallel processing + dynamic length determination
  • Key Feature: Token order optimization for improved quality

๐ŸŒŸ ShuffleLM Overview

๐Ÿš€ Architecture

Sealed just for now. Stay tuned for updates!


๐Ÿ“š Documentation

For detailed documentation and examples, visit our GitHub repository.

๐Ÿ› ๏ธ Installation

# Install with uv (recommended)
uv add shufflers
# Or install from source
git clone https://github.com/thisisthepy/ShuffleLM.git
cd ShuffleLM
uv sync

๐ŸŽฏ Quick Start

Basic Usage

import torch
from shufflers import FasterDecodeMixer
from transformers import AutoTokenizer

# Load model and tokenizer
model_id = "thisisthepy/FasterDecodeMixer-Q3-8B"
model = FasterDecodeMixer.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Generate text
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")

# Parallel generation with shuffling
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_parallel_tokens=30,  # Maximum parallel tokens to generate
        shuffle_strategy="rotary", # Shuffle strategy
        temperature=0.7
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Advanced Usage

from shufflers import ShuffleLM, ShuffleConfig

# Custom configuration
config = ShuffleConfig(
    vocab_size=50257,
    hidden_size=768,
    num_layers=12,
    num_heads=12,
    max_parallel_tokens=50,
    rotary_dim=64,
    shuffle_temperature=0.8
)

# Initialize model
model = ShuffleLM(config)

# Training mode
model.train()
outputs = model(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    labels=labels  # Required only during training
)

loss = outputs.loss
logits = outputs.logits
shuffle_scores = outputs.shuffle_scores  # Position reordering scores

Shuffle Visualization

from shufflers.utils import visualize_shuffle

# Visualize generation process
visualization = visualize_shuffle(
    model=model,
    tokenizer=tokenizer,
    prompt="Hello, I am",
    save_animation=True,
    output_path="shuffle_animation.gif"
)

# Check step-by-step process
for step, tokens in visualization.steps:
    print(f"Step {step}: {tokens}")

๐Ÿ—๏ธ Project Structure

shufflers/
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ shufflelm.py          # Main model class
โ”‚   โ””โ”€โ”€ fasterdecodemixer/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ model.py          # FasterDecodeMixer implementation
โ”‚       โ”œโ”€โ”€ mixer.py          # MLP-Mixer components
โ”‚       โ””โ”€โ”€ rotary.py         # Rotary Regression implementation
โ”œโ”€โ”€ utils/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ config.py             # Configuration classes
โ”‚   โ”œโ”€โ”€ generation.py         # Generation utilities
โ”‚   โ””โ”€โ”€ visualization.py      # Visualization tools
โ””โ”€โ”€ __init__.py

๐Ÿ”ง Development Setup

# Clone repository
git clone https://github.com/thisisthepy/ShuffleLM.git
cd ShuffleLM

# Install development dependencies with uv
uv sync --dev

Code Style

# Format code with uv
uv run black shufflers/
uv run isort shufflers/

# Lint code
uv run flake8 shufflers/
uv run mypy shufflers/

๐Ÿงช Testing

# Run all tests
uv run pytest

# Run specific tests
uv run pytest tests/test_model.py

# Run tests with coverage
uv run pytest --cov=shufflers --cov-report=html

๐Ÿ“ˆ Performance Benchmarks

Model Speed (tokens/sec) BLEU Rouge-L Memory (GB)
Llama3-8B 42 24.8 46.1 2.1
Qwen2.5-7B 38 25.3 47.2 1.9
FasterDecodeMixer 89 24.7 46.9 1.1

GPU: NVIDIA RTX 4090, Batch Size: 1


๐Ÿค Contributing

  1. Create an issue to propose improvements
  2. Fork and create a feature branch
  3. Make changes and add tests
  4. Create a Pull Request

๐Ÿ“„ License

This project is distributed under the MIT License. See LICENSE file for details.

๐Ÿ™ Citation

If you use ShuffleLM in your research or projects, please cite as follows:

@software{shufflelm2025,
  title={ShuffleLM: Parallel Token Generation with Intelligent Reordering},
  author={thisisthepy},
  year={2025},
  url={https://github.com/thisisthepy/ShuffleLM}
}

๐Ÿ“ž Contact


๐ŸŽฏ ShuffleLM: Shuffle Tokens for Faster and Smarter Generation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shufflers-0.1.0.tar.gz (33.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shufflers-0.1.0-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file shufflers-0.1.0.tar.gz.

File metadata

  • Download URL: shufflers-0.1.0.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for shufflers-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e60ab8243b2e25ff0424d31dbc9e97e52ac9da53a276ba397d9836414c990b2a
MD5 13af01e6b409234725452f040a6954b5
BLAKE2b-256 2637ccad704543d0e7483125a7cd7d7197d103c6ce34bae4b976d26d79cd35d1

See more details on using hashes here.

File details

Details for the file shufflers-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: shufflers-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for shufflers-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 54b04f2b5bcc64c80588a3048610165dde894009a69bb5ad9de60d228767a574
MD5 7e4d3c5901527a22202eeca7df88fd26
BLAKE2b-256 39bf8f7a4b6a6f777ed1a68874a0e19875fba42903e3db9da69babc8cdb26d29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page