Skip to main content

OpenLanguageModel (OLM): a modular PyTorch LLM library for building, training, teaching, and researching transformer language models.

Project description

OpenLanguageModel (OLM)

OpenLanguageModel is a PyTorch-native library for building, training, teaching, and researching transformer language models. It is designed for people who want the model architecture to stay visible while the training stack stays manageable.

OLM gives you:

  • readable transformer components in olm.nn
  • implemented model families in olm.models
  • local and Hugging Face dataset streams in olm.data
  • single-device, single-node multi-GPU DDP/FSDP, AMP, checkpointing, callbacks, and automatic trainer selection in olm.train

Website · Docs · Install · Colab Notebooks · API Reference · Examples · Issues

Why OLM

Most language-model libraries either hide the architecture behind configuration, or make you rebuild the whole training path from scratch. OLM sits in the middle: every block is an ordinary torch.nn.Module, but data loading, optimization, mixed precision, single-node multi-GPU training, checkpointing, and logging are already wired into a clean path.

That makes it useful for:

  • students learning how language models are assembled and trained
  • researchers running ablations on attention, norms, feed-forward layers, and residual structure
  • practitioners who want existing PyTorch workflows without a hidden runtime

Llama 3 Block In OLM

Model code in OLM is meant to read like the architecture it represents. For example, the Llama 3 block is built from RMSNorm, grouped-query attention, SwiGLU, and explicit residual structure:

from olm.nn.structure import Block
from olm.nn.structure.combinators import Residual
from olm.nn.attention import GroupedQueryAttention
from olm.nn.feedforward import SwiGLUFFN
from olm.nn.norms import RMSNorm


class Llama3Block(Block):
    def __init__(
        self,
        embed_dim: int,
        intermediate_size: int,
        num_heads: int,
        num_kv_heads: int,
        max_seq_len: int,
        dropout: float,
        rope_theta: float,
    ):
        super().__init__([
            Residual(Block([
                RMSNorm(embed_dim, eps=1e-5),
                GroupedQueryAttention(
                    embed_dim,
                    num_heads,
                    num_kv_heads,
                    max_seq_len,
                    dropout=dropout,
                    rope_theta=rope_theta,
                    use_bias=False,
                ),
            ])),
            Residual(Block([
                RMSNorm(embed_dim, eps=1e-5),
                SwiGLUFFN(
                    embed_dim,
                    hidden_dim=intermediate_size,
                    dropout=dropout,
                    bias=False,
                ),
            ])),
        ])

Source: src/olm/models/meta/llama3.py

Train With The Stack Connected

You can keep the model and optimizer as normal PyTorch objects while OLM handles the training loop details:

import torch

from olm.data.datasets import DataLoader, FineWebEduDataset
from olm.data.tokenization import HFTokenizer
from olm.models.openai import GPT2Model
from olm.train import AutoTrainer
from olm.train.optim import AdamW

tokenizer = HFTokenizer("gpt2")
dataset = FineWebEduDataset(tokenizer, context_length=1024, streaming=True)
loader = DataLoader(dataset, batch_size=8, num_workers=4)

model = GPT2Model(
    vocab_size=tokenizer.vocab_size,
    embed_dim=768,
    num_layers=12,
    num_heads=12,
    max_seq_len=1024,
)

trainer = AutoTrainer(
    model,
    AdamW,
    loader,
    device="auto",
    context_length=1024,
    learning_rate=3e-4,
    grad_accum_steps=8,
)
trainer.train(epochs=1, max_steps=1000)

AutoTrainer chooses between CPU, single-GPU, and single-node multi-GPU DDP/FSDP paths based on the hardware and model. You can still use Trainer, DDPTrainer, or FSDPTrainer directly when you want explicit control.

Implemented Model Families

OLM includes named presets and configurable base classes for common transformer families:

Family Source
GPT-2 src/olm/models/openai/gpt2.py
Llama 2 src/olm/models/meta/llama2.py
Llama 3 / 3.1 / 3.2 src/olm/models/meta/llama3.py
Qwen 2.5 src/olm/models/alibaba/qwen2.py
Phi-3 / Phi-3.5 src/olm/models/microsoft/phi3.py
Phi-4 src/olm/models/microsoft/phi4.py
Gemma 2 src/olm/models/google/gemma2.py
OLMo src/olm/models/allenai/olmo.py
OPT src/olm/models/facebook/opt.py

See docs/api.md for the generated API reference and examples/ for training scripts.

Installation

Use Python 3.10, 3.11, or 3.12.

git clone https://github.com/openlanguagemodel/openlanguagemodel.git
cd openlanguagemodel
pip install -e .

For development:

pip install -e ".[dev]"
pytest tests

Optional extras:

pip install -e ".[wandb]"  # Weights & Biases logging
pip install -e ".[docs]"   # documentation tooling

See docs/installation.md for dependency and release-build details.

Documentation Flow

Project Status

OLM v2.2 is the stabilization and release-readiness pass: tied output embeddings by default, model-family smoke coverage, AutoTrainer, streaming datasets, AMP, checkpointing, single-node DDP/FSDP paths, clearer installation docs, and a stronger generated API reference. Multi-node training remains a v4 roadmap item.

Citation

@software{openlanguagemodel2026,
  title = {OpenLanguageModel},
  author = {Tavish Mankash and Vardhaman Kalloli and Keshava Prasad},
  year = {2026},
  url = {https://github.com/openlanguagemodel/openlanguagemodel}
}

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openlanguagemodel-2.2.0.tar.gz (91.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openlanguagemodel-2.2.0-py3-none-any.whl (134.1 kB view details)

Uploaded Python 3

File details

Details for the file openlanguagemodel-2.2.0.tar.gz.

File metadata

  • Download URL: openlanguagemodel-2.2.0.tar.gz
  • Upload date:
  • Size: 91.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openlanguagemodel-2.2.0.tar.gz
Algorithm Hash digest
SHA256 7593f753415e9f961104037ab841d6111143c3b1943aae81823d5e63f920b5a7
MD5 fb45313baaf1f44595ccc40c8e31f0b6
BLAKE2b-256 f93da738e2e4dc1f9067a64a478b090f18cb29b622e5773c0f62f9b60bed9f94

See more details on using hashes here.

Provenance

The following attestation bundles were made for openlanguagemodel-2.2.0.tar.gz:

Publisher: publish.yml on openlanguagemodel/openlanguagemodel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openlanguagemodel-2.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for openlanguagemodel-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 16a2ec8547899790461598f222bf56a858e8b13cfcc03c4ef037b95b223de495
MD5 44c9fd1a8bd97bf49f53dd965df4a031
BLAKE2b-256 b10880f0a62316b6548093f781f01263fe86c37519a0966bd2d5de3213b19e87

See more details on using hashes here.

Provenance

The following attestation bundles were made for openlanguagemodel-2.2.0-py3-none-any.whl:

Publisher: publish.yml on openlanguagemodel/openlanguagemodel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page