Skip to main content

A Llama-style decoder architecture with explicit latent plans and conditional VAE training

Project description

Free Transformer

PyPI version Python 3.11+ License: MIT Documentation Tests Code style: black

Free Transformer: A Llama-style decoder architecture with explicit latent plans, conditional VAE training, and benchmark comparisons against standard Transformers.

Designed for efficient PyTorch training on modern GPUs with full FSDP support and modern optimizations.

๐Ÿ“– Complete Documentation | ๐Ÿš€ Quick Start Guide | ๐Ÿ—๏ธ Architecture Details


What Is the Free Transformer?

Traditional autoregressive Transformers generate each token by conditioning only on the sequence so far ("reactive" behavior). Free Transformer introduces a latent planning mechanismโ€”first choosing a stochastic abstract plan (Z), then generating tokens to fit that plan.
This scalable conditional VAE architecture maintains high-level coherence, improves controllable generation, and enables richer sequence modeling.

Architecture Overview

alt text


Features

๐Ÿ—๏ธ Architecture

  • Llama-style backbone: RMSNorm, SwiGLU, RoPE, Grouped-Query Attention (GQA)
  • Latent Planning: Explicit plan variable Z with differentiable binary coding
  • Conditional VAE: Reconstruction + KL loss with free bits regularization

โšก Performance & Scaling

  • FSDP Support: Multi-GPU training with PyTorch Fully Sharded Data Parallel
  • Mixed Precision: Automatic Mixed Precision (AMP) with gradient scaling
  • Memory Efficient: Gradient checkpointing and optimized attention patterns
  • Modern Optimizations: bfloat16, efficient parameter sharding

๐Ÿ”ง Development & Training

  • Flexible Training: Switchable inference/training flows with mode selection
  • Synthetic + Real Data: Fast prototyping with built-in synthetic data generation
  • Comprehensive Testing: Unit/integration tests, benchmark comparisons
  • Quality Assurance: Type checking, linting, formatting, CI-ready

๐Ÿ“ฆ Usability

  • Extensible API: Modular classes, CLI scripts, YAML configuration
  • Docker Support: Containerized demos and development environment
  • Documentation: API references, architecture guides, examples

Installation

From PyPI (Recommended)

pip install free-transformer

From Source

Using UV (recommended):

# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and install
git clone https://github.com/udapy/free-transformer.git
cd free-transformer
uv venv --python 3.12
source .venv/bin/activate
uv pip install -e ".[dev]"

Using pip:

git clone https://github.com/udapy/free-transformer.git
cd free-transformer
pip install -e ".[dev]"

๐Ÿ“‹ Detailed installation instructions: Installation Guide


Quick Start

๐Ÿณ Docker (Fastest)

The fastest way to try Free Transformer:

git clone https://github.com/udapy/free-transformer.git
cd free-transformer
docker-compose up free-transformer-demo

๐Ÿ Python API

from free_transformer import FreeTransformer, ModelConfig

# Create and train a model
config = ModelConfig(vocab_size=1000, hidden_dim=128, num_layers=6, latent_dim=8)
model = FreeTransformer(config)

# Training mode
import torch
tokens = torch.randint(0, 1000, (2, 128))
logits, z_logits = model(tokens, mode='training')

# Generation
generated = model.generate(tokens[:, :10], max_new_tokens=20)

๐Ÿš€ Command Line

# Generate synthetic data and run demo
make demo

# Train models separately
make train-baseline  # Standard Transformer
make train-free      # Free Transformer
make compare         # Compare results

๐ŸŽฏ Complete tutorial: Quick Start Guide


Manual Installation & Quick Start Demo

  1. Generate Small Synthetic Data

    make generate-data-small
    
  2. Train Baseline Transformer

    make train-baseline
    
  3. Train Free Transformer

    make train-free
    
  4. Run Model Comparison

    make compare
    

Or run the full pipeline:

make demo

Check results in:

  • checkpoints/baseline/
  • checkpoints/free/
  • results/comparison/results.json

Key Features Comparison

Feature Standard Transformer Free Transformer
Generation Reactive (token-by-token) Plan-then-generate
Coherence Local Global + Local
Controllability Limited High (via plan manipulation)
Training Cross-entropy loss Conditional VAE loss
Memory Baseline +10-15% (inference)
Speed Baseline -5-10% (inference)

๐Ÿ”ฌ Detailed comparison: Architecture Overview


Repository Structure

free-transformer/
โ”œโ”€โ”€ src/free_transformer/
โ”‚   โ”œโ”€โ”€ model.py
โ”‚   โ”œโ”€โ”€ baseline.py
โ”‚   โ”œโ”€โ”€ encoder.py
โ”‚   โ”œโ”€โ”€ latent.py
โ”‚   โ”œโ”€โ”€ injection.py
โ”‚   โ”œโ”€โ”€ losses.py
โ”‚   โ”œโ”€โ”€ synthetic_data.py
โ”‚   โ”œโ”€โ”€ train_utils.py
โ”‚   โ””โ”€โ”€ config.py
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ train_baseline.py
โ”‚   โ”œโ”€โ”€ train_free.py
โ”‚   โ”œโ”€โ”€ eval_compare.py
โ”‚   โ””โ”€โ”€ generate_data.py
โ”œโ”€โ”€ configs/
โ”‚   โ”œโ”€โ”€ baseline.yaml
โ”‚   โ””โ”€โ”€ free_transformer.yaml
โ”œโ”€โ”€ docker/
โ”‚   โ”œโ”€โ”€ demo.sh
โ”‚   โ””โ”€โ”€ README.md
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ unit/
โ”‚   โ”œโ”€โ”€ integration/
โ”‚   โ””โ”€โ”€ test_comparison.py
โ”œโ”€โ”€ docs/
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ Dockerfile.cpu
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ Makefile
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ .python-version
โ”œโ”€โ”€ LICENSE
โ””โ”€โ”€ README.md

Testing & Quality

Run all tests:

make test

Quality checks:

make quality

Advanced Features

๐Ÿš€ Multi-GPU Training

# FSDP training with automatic GPU detection
make train-free-fsdp

# Custom distributed training
torchrun --nproc_per_node=auto examples/train_free.py --use-fsdp

๐Ÿ“Š Flexible Data

  • HuggingFace datasets integration
  • Built-in synthetic data generation
  • Custom data loading pipelines

๐Ÿ”ง Extensible Architecture

  • Modular components for easy customization
  • Custom loss functions and training schedules
  • Plugin system for new features

๐Ÿ“š Learn more: Training Guide | Multi-GPU Setup


Documentation

๐Ÿ“– Complete Documentation

Quick Links

Local Documentation

# Serve documentation locally
make docs-serve
# Open http://127.0.0.1:8000

License

MIT License โ€” see LICENSE


Contributing

We welcome contributions! Please see our Contributing Guide for details.

Quick Development Setup

git clone https://github.com/udapy/free-transformer.git
cd free-transformer
make install-all  # Install with all dependencies
make test         # Run tests
make quality      # Check code quality

Before Submitting

  • โœ… Tests pass: make test
  • โœ… Code quality: make quality
  • โœ… Documentation builds: make docs-build

๐Ÿ“‹ Full guidelines: Contributing Guide


FAQ

Can I use this for real-world (non-synthetic) data?
Yes! Edit configs and use HuggingFace datasets.

How do I run distributed training?
Use provided CLI flags or edit config. See docs and Makefile.

How do I change architecture parameters?
Edit YAML config files for layer size, latent dim, number of blocks, etc.

Can I run this without installing dependencies locally?
Yes! Use Docker: docker-compose up free-transformer-demo for a complete demo.

What if I don't have a GPU?
Use the CPU Docker image: make docker-build-cpu && make docker-run-cpu


Citation

If you use Free Transformer in your research, please cite:

@software{free_transformer,
  title={Free Transformer: Explicit Latent Planning for Autoregressive Generation},
  author={Phalak, Uday},
  year={2024},
  url={https://github.com/udapy/free-transformer},
  version={0.1.0}
}

Links


Free Transformer - Bringing explicit planning to autoregressive generation

Documentation โ€ข PyPI โ€ข GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

free_transformer-0.1.2.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

free_transformer-0.1.2-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file free_transformer-0.1.2.tar.gz.

File metadata

  • Download URL: free_transformer-0.1.2.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for free_transformer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e574c3b94422852d0787874bc01359987b59b8d7ade45743f3a8fda58c0d2b96
MD5 38a038bb3e660fa805269d03811c0082
BLAKE2b-256 9b7bc58ac7755a0d05b28b25bd5c518e13ad53f20a8c161a64592591a964e5da

See more details on using hashes here.

File details

Details for the file free_transformer-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for free_transformer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c9ef5a25db669f99ef48b1a01c8177e5353e8bd927db54f2157b359be9ef0fc9
MD5 7a2d073db354005673cec0449ae3d9f2
BLAKE2b-256 f23327132645c7c6a101b290df8d20d21e4db15483bce4926346b46a3aac17ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page