A clean, modular framework for training large language models with modern PyTorch features

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

alex_art

These details have not been verified by PyPI

Project links

Bug Tracker

Project description

Optimus-DL

Optimus-DL is a modular, high-performance research framework for training Large Language Models (LLMs) and other deep learning models. It leverages modern PyTorch features (AMP, DDP, Compile) and a flexible, composition-based architecture.

Key Features

Modular "Recipe" Architecture: Clean separation between model definitions, data pipelines, and training logic.
Hydra-based Configuration: Hierarchical, type-safe, and easily conveniently override-able configurations.
Universal Metrics System: Lazy evaluation and automatic distributed aggregation of metrics.
Unified Logging: First-class support for Weights & Biases, MLflow (with async and system metrics), and JSONL.
Modern PyTorch: Built-in support for Mixed Precision (AMP), FSDP2, Tensor Parallelism, Sequence Parallelism, and torch.compile.
Efficient Kernels: Integrated support for Liger-Kernel for memory-efficient and fast RMSNorm, SwiGLU, and CrossEntropy.
Registry System: easy dependency injection and component swapping via a centralized registry.

The core idea of making everything modular and replacable is to make research experiments easy to implement cleanly.

Supported Models

Optimus-DL includes highly optimized implementations of:

Llama 2 / 3: Full support for GQA, RoPE, and various sharding strategies.
Qwen: Support for Qwen-style attention (Q/K Norm) and architectures.
GPT-2: Classic architecture for baselining.

Quick Start

Installation

# Clone the repository
git clone <repository-url>
cd optimus-dl

# Install in editable mode with dependencies
pip install -e .

Training

Training is orchestrated via scripts/train.py using Hydra configs.

# Run with default configuration
python scripts/train.py

# Override specific parameters
python scripts/train.py model=gpt2 optimization.batch_size=64 common.use_gpu=true

# Your own config
python scripts/train.py --config-name=train_llama

Writing Train Configs

This project uses Hydra and OmegaConf for configuration management. Configurations are hierarchical and composable, allowing you to mix and match models, datasets, and training strategies.

Structure & Interpolation

Configs are located in configs/train/. A typical training config composes defaults (model, optimizer, scheduler) and then overrides specific parameters.

We use a special args section as a "scratch space" for high-level variables. These are referenced throughout the config using OmegaConf's interpolation syntax ${...}. This ensures consistency (e.g., setting seq_len in one place updates both the model and the data pipeline).

_name: base
args:
  name: my-experiment
  batch_size: 64
  seq_len: 1024

# ... later in the config ...
optimization:
  iterations: ${args.iterations}

data:
  scratch:
    base_transforms:
      _name: compose
      transforms:
        # ...
        - _name: flat_batcher
          batch_size: ${args.batch_size} # Interpolated from args
          seq_len: ${args.seq_len}

Data Pipelines & `data.scratch`

The data section typically defines train_datasets and eval_datasets. To avoid repeating complex transform chains, we define them in data.scratch and reference them via interpolation.

data:
  scratch:
    # Define the transform chain once
    my_transform:
      _name: compose
      transforms:
        - _name: tokenize
          tokenizer_config: {_name: tiktoken, name: gpt2}
        - _name: to_device

  train_datasets:
    source:
      _name: loop
      inner: {_name: preset_dataset, split: train}
    # Reference the transform
    transform: ${data.scratch.my_transform}

Hydra & Omegaconf Extra Quick Guide

Here are some power-user features you'll likely use:

Overriding Defaults: You can swap out entire components from the command line.

# Switch the model to GPT-2 and optimizer to SGD
python scripts/train.py model=gpt2 optimization/optimizer=sgd

Multirun (-m): Run multiple experiments sequentially with a sweep.

# Run 3 experiments with different learning rates
python scripts/train.py -m optimization.optimizer.lr=1e-3,1e-4,1e-5

Interpolation: Reference other config values dynamically.
- ${layout.param}: Standard interpolation.
- ${oc.env:VAR_NAME}: Read from environment variable VAR_NAME.
- ${.relative_param}: Relative path interpolation.
- ${eval:expression}: Evaluate a Python expression. For example, ${eval:"'string' + '_suffix'"} or ${eval:"int(100 * 0.5)"}. This is defined in optimus_dl/core/omegaconf.py.
Debugging: See the resolved configuration without running the code.

# Print the full config structure
python scripts/train.py --config-name=train_llama -c job

Framework Internals

Understanding these core components is crucial for advanced usage and research extensions.

Registry System

The framework relies heavily on a registry pattern to decouple configuration from implementation. This allows you to swap components (models, optimizers, schedulers) just by changing the _name field in the config.

Location: optimus_dl/core/registry.py
Usage:

from optimus_dl.core.registry import make_registry

# Create a new registry
registry, register, build = make_registry("my_component")

@register("my_impl")
class MyImplementation:
    def __init__(self, param): ...

# Build from config
obj = build(RegistryConfig(_name="my_impl", param=1))

Data Pipeline

Data loading is split into Sources and Transforms.

Source: Yields raw items (e.g., text, examples).
Transforms: A chain of operations (Tokenize -> Chunk -> Shuffle -> Batch -> ToDevice).

This design allows for highly reusable data processing pipelines. Complex transform chains are often defined in data.scratch and referenced in dataset configs.

Checkpointing

We use PyTorch's Distributed Checkpoint (DCP) API for efficient, sharded saving/loading of large models.

Structure: Checkpoints are directories containing sharded tensor data and a metadata file.
Manager: CheckpointManager handles the complexity of saving model, optimizer, scheduler, and dataloader states.
Auto-Resume: The training loop automatically detects the latest checkpoint in the output directory and resumes from it.

LoadStrategy: For fine-tuning or experiments, you might want to load only parts of a checkpoint. The LoadStrategy class (optimus_dl/modules/checkpoint/load_strategy.py) controls this.

load_model (bool): Load model weights.
load_optimizer (bool): Load optimizer state.
load_scheduler (bool): Load learning rate scheduler state.
load_data_sources (bool): Load data source state (e.g. readers position).
load_dataloaders (bool): Load full dataloader state.
load_metrics (bool): Load accumulated metrics.
load_iteration (bool): Resume iteration count.
extra_ignore_keys (list): Specific keys to ignore in the checkpoint state dict.

Advanced Usage

Model Transforms

Optimus-DL applies transformations to the model after initialization but before training. This is where distributed wrappers and compilation happen.

Config: model_transforms list in train.yaml.
Common Transforms:
- ddp: Standard DistributedDataParallel.
- fully_shard: PyTorch FSDP2 (Fully Sharded Data Parallel). Supports mixed precision, CPU offloading, and mesh sharding.
- compile: torch.compile for graph optimization.

model_transforms:
  - _name: fully_shard
    mixed_precision:
      param_dtype: bfloat16
      reduce_dtype: float32
  - _name: compile

Evaluation with `lm_eval`

The framework integrates with the Language Model Evaluation Harness for standardized benchmarks.

Script: scripts/eval.py
Config: configs/eval/default.yaml

# Evaluate a checkpoint on Hellaswag and MMLU
python scripts/eval.py \
    common.checkpoint_path=outputs/my-run/checkpoint_00010000 \
    lm_eval.tasks=[hellaswag,mmlu] \
    lm_eval.batch_size=8

More advanced:

python scripts/eval.py --config-name quick_pretrained \
          common.checkpoint_path=null ++common.model._name=preset_hfllama2 ++common.model.hf_model_name=TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
          lm_eval.tasks=[hellaswag,mmlu] \
          lm_eval.batch_size=4

Serving Models

Optimus-DL provides a simple serving script for deploying trained models as an OpenAI-compatible API endpoint. This uses scripts/serve.py.

Script: scripts/serve.py
Config: configs/serve/

# Serve a TinyLlama model
python scripts/serve.py --config-name=tinyllama

Make requests:

curl -X POST http://127.0.0.1:8000//v1/chat/completions \
-d '{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "How many helicopters can a human eat in one sitting?"}], "max_tokens": 100, "temperature": 0.01}'

curl -X POST http://localhost:8000/v1/completions -d '{"prompt": "All:", "max_tokens": 50, "temperature": 0.01}'

Project Structure

optimus_dl/: Main package source code.
- core/: Fundamental utilities (logging, registry, device management).
- modules/: Pluggable components (models, optimizers, data loaders).
- recipe/: Orchestration logic (training loops, evaluation).
configs/: Hierarchical Hydra configuration files.
scripts/: Entry points.

Development

The project enforces strict code quality standards.

# Run tests
pytest

# Format code
black .
isort .
ruff check --fix .

License

MIT License.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

alex_art

These details have not been verified by PyPI

Project links

Bug Tracker

Release history Release notifications | RSS feed

0.2.5

May 21, 2026

0.2.4

May 20, 2026

0.2.3

May 20, 2026

0.2.0

Mar 25, 2026

This version

0.1.5

Mar 25, 2026

0.1.4

Mar 21, 2026

0.1.3

Mar 20, 2026

0.1.2

Mar 18, 2026

0.1.1

Mar 12, 2026

0.1.0

Mar 1, 2026

0.0.7

Dec 28, 2025

0.0.6

Dec 26, 2025

0.0.5

Dec 26, 2025

0.0.4

Dec 25, 2025

0.0.3

Dec 22, 2025

0.0.2

Dec 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimus_dl-0.1.5.tar.gz (323.1 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

optimus_dl-0.1.5-py3-none-any.whl (253.8 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file optimus_dl-0.1.5.tar.gz.

File metadata

Download URL: optimus_dl-0.1.5.tar.gz
Upload date: Mar 25, 2026
Size: 323.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for optimus_dl-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`a6cd4e71090b6d510fd4c62bad0222210d8f17213df0f80e024f8e4dc748c0af`
MD5	`fd11e42b243a16ae667d8c4d8d2bc6ac`
BLAKE2b-256	`c344c4c36351ea7ef9a6865f05100993e00d20afe591e217ebda183c4a16e7a2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimus_dl-0.1.5.tar.gz:

Publisher: publish.yml on alexdremov/optimus-dl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: optimus_dl-0.1.5.tar.gz
- Subject digest: a6cd4e71090b6d510fd4c62bad0222210d8f17213df0f80e024f8e4dc748c0af
- Sigstore transparency entry: 1180953949
- Sigstore integration time: Mar 25, 2026
Source repository:
- Permalink: alexdremov/optimus-dl@aa55958ba674e9d0666bb3273c06eb25b5eebe0a
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/alexdremov
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@aa55958ba674e9d0666bb3273c06eb25b5eebe0a
- Trigger Event: release

File details

Details for the file optimus_dl-0.1.5-py3-none-any.whl.

File metadata

Download URL: optimus_dl-0.1.5-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 253.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for optimus_dl-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`830e423c9a173b4ad6c1b813e468e8a15c004c4358aeee7018c2a8c6528eccd4`
MD5	`b71571699f0eb092f6b674432d5020b7`
BLAKE2b-256	`d36b3cde0eac193ecae60207fa3a95e6c5ad03dc54a1c39d1a0892fe0c064828`

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimus_dl-0.1.5-py3-none-any.whl:

Publisher: publish.yml on alexdremov/optimus-dl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: optimus_dl-0.1.5-py3-none-any.whl
- Subject digest: 830e423c9a173b4ad6c1b813e468e8a15c004c4358aeee7018c2a8c6528eccd4
- Sigstore transparency entry: 1180953957
- Sigstore integration time: Mar 25, 2026
Source repository:
- Permalink: alexdremov/optimus-dl@aa55958ba674e9d0666bb3273c06eb25b5eebe0a
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/alexdremov
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@aa55958ba674e9d0666bb3273c06eb25b5eebe0a
- Trigger Event: release

optimus-dl 0.1.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Optimus-DL

Key Features

Supported Models

Quick Start

Installation

Training

Writing Train Configs

Structure & Interpolation

Data Pipelines & data.scratch

Hydra & Omegaconf Extra Quick Guide

Framework Internals

Registry System

Data Pipeline

Checkpointing

Advanced Usage

Model Transforms

Evaluation with lm_eval

Serving Models

Project Structure

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Data Pipelines & `data.scratch`

Evaluation with `lm_eval`