Skip to main content

Add your description here

Project description

sarasa

A minimum LLM training framework built on pure PyTorch with simplicity and extensibility.

[!CAUTION] sarasa is developed by an error-prone human and thus may contain many bugs. Use it at your own risk.

Installation

uv sync [--extra cpu|cu128|cu130] [--extra flash_attn]

or

uv add sarasa[cpu|cu128|cu130]

Features

  • Pure PyTorch implementation

  • Flexible configuration system with command-line overrides

  • Support from a single GPU to multiple GPUs (simple DDP and FSDP for now)

  • Selective activation checkpointing (SAC) for memory efficiency

  • Async distributed checkpoint saving / loading

  • Profiling

  • FP8 training

  • Post-training

Usage

It's (almost) ready to use. First, set up tokenizer, e.g.,

mkdir tokenizer
cd tokenizer
uvx hf download --local-dir . --include "tokenizer*" "meta-llama/Llama-3.1-8B"

Then, the following command starts training of a GPT model on FineWeb-edu with a single or multiple GPUs.

uv run torchrun --nproc_per_node="gpu" main.py \
--config-file configs/example.py \
[--train.local-batch-size 8 ...] # override config options as needed

For details, run

uv run torchrun --nproc_per_node="gpu" main.py --help

Extending sarasa with Custom Components

Extending sarasa is as simple as defining your own configuration dataclasses with create methods. Users can define custom configurations for models, optimizers, learning-rate schedulers, and datasets. Here's an example of using a custom optimizer:

from sarasa import Trainer, Config
from custom_optim import CustomOptimizer, CustomOptimizer2

@dataclass
class CustomOptim:
    lr: float = ...

    def create(self,
               model: torch.nn.Module
    ) -> torch.optim.Optimizer:
        return CustomOptimizer(model.parameters(), lr=self.lr, ...)

@dataclass
class CustomOptim2:
    lr: float = ...

    def create(self,
               model: torch.nn.Module
    ) -> torch.optim.Optimizer:
        return CustomOptimizer2(model.parameters(), lr=self.lr, ...)


if __name__ == "__main__":
    config = Config.from_cli(optim_type=CustomOptim | CustomOptim2)
    trainer = Trainer(config)
    trainer.train()

Thanks to tyro's type support, sarasa can automatically recognize multiple custom optimizer types. From the command line, you can specify which custom optimizer to use:

python script.py optim:custom_optim --optim.lr 0.001 ...
# or
python script.py optim:custom_optim2 --optim.lr 0.002 ...

(As tyro automatically converts config class names from CamelCase to snake_case, config class names are recommended not to include Config suffixes.)

Config File Example

It's very simple. IDE autocompletion will help you.

from sarasa import Config, Data, LRScheduler, Model, Train, LRScheduler
from custom_optim import CustomOptim

# only one Config instance should be defined in each config file
config = Config.create(
    model=Model(num_layers=12),
    train=Train(
        local_batch_size=16,
        global_batch_size=256,
        dtype="bfloat16",
    ),
    optim=CustomOptim(lr=0.001),
    lr_scheduler=LRScheduler(
        decay_type="linear",
        warmup_steps=1000,
        total_steps=100000,
    ),
    data=Data(tokenizer_path="./tokenizer"),
    seed=12,
)

Acknowledgements

This project is heavily inspired by and borrows code from torchtitan.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sarasa-0.0.8.tar.gz (33.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sarasa-0.0.8-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file sarasa-0.0.8.tar.gz.

File metadata

  • Download URL: sarasa-0.0.8.tar.gz
  • Upload date:
  • Size: 33.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sarasa-0.0.8.tar.gz
Algorithm Hash digest
SHA256 7b87b811cf6e958dc65fce51d541932d532bb3155448f22457c30a1999dba2e0
MD5 0701fd56a7949288d2a3b56a87fe84be
BLAKE2b-256 c30b55bf566913e281e9fb9b4f20413ceab4dcf43388ce16f1515f4bf941ece3

See more details on using hashes here.

File details

Details for the file sarasa-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: sarasa-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sarasa-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 6cbe9972b4c48a311d96f0685a7d0b9dfd60b072b3a4cb648a9d42dd3cc1747b
MD5 9e523b7910d4f7082e746b62967d76c0
BLAKE2b-256 6c6b127366825376dfd99d0a6a7215c573287891e44da5737bd7468467eb2ebc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page