ALTAModel SFT — instruction-tuned Kinyarwanda language models from YaliLabs.

These details have not been verified by PyPI

Project links

Project description

alta-models-sft (internal)

Monorepo for the ALTA SFT runtime package and its training pipeline. This README is for internal use — anyone with repo access. The public-facing PyPI README is PYPI_README.md and ships with the wheel.

Confidential. Training scripts, datasets, internal benchmarks, and unpublished checkpoints should never be checked in. See .gitignore for excluded paths.

What's in this repo
First-time setup
Day-to-day workflows
Architecture: how training and the package share code
Versioning policy
Operations

What's in this repo

alta-models-sft/
├── src/alta_models_sft/          ← Runtime package (the only thing shipped to PyPI)
│   ├── modeling/                 ← Model architecture (RoPE, GQA, SwiGLU, blocks)
│   ├── inference/                ← ALTAChat, ChatML, sampling, masking
│   ├── hub.py                    ← Local + Hub model resolution
│   ├── cli.py                    ← `alta-sft` CLI
│   └── server.py                 ← FastAPI server (extra dep)
│
├── training/                     ← Training pipeline (stays in repo)
│   ├── train.py                  ← Main training entry point
│   ├── config.py                 ← All hyperparameters
│   ├── dataset.py                ← SFT dataset + ChatML masking + collator
│   ├── builder.py                ← Wraps ALTAModel for training
│   ├── checkpoint.py             ← TopK manager, save/load
│   ├── distributed.py            ← DDP setup
│   ├── deduplicate.py            ← MinHash + LSH dedup
│   ├── build_multiturn.py        ← Multi-turn synthesis from single-turn data
│   ├── resource_monitor.py       ← GPU/CPU/RAM telemetry
│   └── ...
│
├── scripts/                      ← Operational tools (never shipped)
│   ├── test_inference.py         ← 8-subcommand model tester
│   ├── export_for_release.py     ← Training checkpoint → release directory
│   └── upload_to_hub.sh          ← Safe Hub upload with validation
│
├── tests/                        ← pytest suite
├── .github/workflows/            ← CI: tests + PyPI release
├── pyproject.toml                ← Package metadata (controls what ships)
├── README.md                     ← THIS file (internal)
└── PYPI_README.md                ← Public README (gets bundled into the wheel)

Important: The wheel only includes src/alta_models_sft/. The [tool.hatch.build.targets.wheel] section in pyproject.toml enforces this, and CI fails if training/, scripts/, or tests/ leak in.

First-time setup

git clone git@github.com:yalilabs/alta-models-sft.git
cd alta-models-sft

python -m venv .venv
source .venv/bin/activate
pip install -e ".[all]"

Verify everything works:

pytest                                  # all tests should pass
ruff check src tests                    # lint should be clean
alta-sft --version                      # CLI installed
python -m training.train --help         # training importable

You also need:

huggingface-cli login                   # for Hub uploads
# Optional: set HF_TOKEN in your shell for non-interactive use

Day-to-day workflows

Training a new SFT model

1. Prepare data

Training data goes in ./data/ (gitignored). Supported per-sample formats — any mix works in one JSONL:

{"question": "...", "answer": "..."}
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"instruction": "...", "input": "...", "output": "..."}
{"document": "...", "summary": "..."}

Recommended preprocessing pipeline:

# 1. Deduplicate (writes <output>.jsonl + .report.txt + .duplicates.jsonl + .stats.json)
python -m training.deduplicate \
    --input ./data/raw.jsonl \
    --output ./data/clean \
    --threshold 0.85

# 2. Synthesize multi-turn samples (helps with conversational coherence)
python -m training.build_multiturn \
    --input ./data/clean.jsonl \
    --output ./data/training.jsonl \
    --multiturn_ratio 0.3 \
    --max_chain_length 3

# 3. Hold out a validation split (any way you like)
shuf ./data/training.jsonl | head -1000 > ./data/testing.jsonl
shuf ./data/training.jsonl | tail -n +1001 > ./data/training_split.jsonl

2. Run training

Single GPU:

python -m training.train \
    --pretrained_dir ./pretrained/alta_base \
    --train_data ./data/training_split.jsonl \
    --val_data ./data/testing.jsonl \
    --output_dir ./sft_output

Multi-GPU (DDP via torchrun):

torchrun --nproc_per_node=4 -m training.train \
    --pretrained_dir ./pretrained/alta_base \
    --train_data ./data/training_split.jsonl \
    --val_data ./data/testing.jsonl \
    --output_dir ./sft_output

Resume from a previous checkpoint:

python -m training.train \
    --resume ./sft_output/checkpoints/alta_epoch003_step1500_loss1.8234.pt \
    --train_data ./data/training_split.jsonl \
    --val_data ./data/testing.jsonl \
    --output_dir ./sft_output

Hyperparameters live in training/config.py. Common ones can be overridden via CLI:

python -m training.train ... \
    --epochs 5 \
    --batch_size 16 \
    --target_lr 1e-5 \
    --max_seq_len 2048 \
    --grad_accum_steps 4

3. Monitor

tensorboard --logdir ./sft_output/tensorboard
tail -f ./sft_output/logs/train_rank0.log

Watch for: val_loss decreasing each epoch, train_loss not diverging from val_loss, sample generations becoming coherent. The expected val_loss range after epoch 1 is in config.py (expected_val_loss_at_epoch_1).

4. When training finishes

train.py automatically calls save_pretrained() on the best model. The output is at ./sft_output/alta_sft_final/:

sft_output/alta_sft_final/
├── config.json                   # includes model_format_version
└── model.safetensors             # safetensors format, ready to distribute

This directory is already in the distribution format — you can load it immediately:

alta-sft chat --model ./sft_output/alta_sft_final

Testing a trained model

scripts/test_inference.py has 8 subcommands. Run from repo root.

# 1. Quick smoke test (3 prompts, <1 min) — ALWAYS run this first
python scripts/test_inference.py smoke --model ./sft_output/alta_sft_final

# 2. Full prompt suite (writes JSON report)
python scripts/test_inference.py suite \
    --model ./sft_output/alta_sft_final \
    --output ./results/run_$(date +%Y%m%d_%H%M).json

# 3. Interactive REPL for qualitative exploration
python scripts/test_inference.py chat \
    --model ./sft_output/alta_sft_final --stream

# 4. Single prompt
python scripts/test_inference.py single \
    --model ./sft_output/alta_sft_final \
    --prompt "Sobanura amateka y'u Rwanda" --stream

# 5. Multi-turn conversation test (catches memory bugs)
python scripts/test_inference.py multiturn --model ./sft_output/alta_sft_final

# 6. Sampling comparison (same prompt, different configs)
python scripts/test_inference.py compare \
    --model ./sft_output/alta_sft_final \
    --prompt "Mwiriwe!" \
    --configs '[{"temperature":0.3},{"temperature":0.8,"top_p":0.95}]'

# 7. Mask ablation (loads model twice — with/without non-Kinyarwanda mask)
python scripts/test_inference.py mask_ablation \
    --model ./sft_output/alta_sft_final --prompt "Bite?"

# 8. Throughput benchmark
python scripts/test_inference.py bench \
    --model ./sft_output/alta_sft_final --num_prompts 20 --device cuda --dtype bfloat16

Promotion criteria before releasing a checkpoint publicly:

smoke passes (no crashes, non-empty responses)
suite has zero crashes; spot-check at least 3 categories of responses look reasonable
multiturn shows the model uses prior context (doesn't repeat introductions)
mask_ablation shows the model produces clean Kinyarwanda even without the mask (a real fluency check)
bench throughput is within expected range for the target hardware

Exporting for distribution

train.py already saves in the distribution format, so this step is only needed if you want to:

Bundle a tokenizer into the directory
Tag the export with a release version string
Convert an old .pt checkpoint to safetensors

python scripts/export_for_release.py \
    --checkpoint ./sft_output/alta_sft_final \
    --output ./release/alta-base-sft-v1.0 \
    --version v1.0 \
    --include_tokenizer \
    --tokenizer yalilabs/alta-tokenizer

Output:

release/alta-base-sft-v1.0/
├── config.json                   # with release_version + release_date metadata
├── model.safetensors
├── tokenizer.json                # bundled
├── special_tokens_map.json
├── tokenizer_config.json
└── README.md                     # auto-generated model card

Uploading weights to Hugging Face

Use upload_to_hub.sh — it validates everything (auth, repo existence, load test, tag collision) before uploading.

# Standard release
./scripts/upload_to_hub.sh \
    --model_dir ./release/alta-base-sft-v1.0 \
    --repo yalilabs/alta-base-sft \
    --version v1.0

# First-time release of a new model (creates repo if missing)
./scripts/upload_to_hub.sh \
    --model_dir ./release/alta-base-sft-v0.9 \
    --repo yalilabs/alta-base-sft \
    --version v0.9 \
    --private --create_repo

# CI-friendly (no prompts)
./scripts/upload_to_hub.sh \
    --model_dir ./release/alta-base-sft-v1.0 \
    --repo yalilabs/alta-base-sft \
    --version v1.0 --yes

# Dry-run to validate without uploading
./scripts/upload_to_hub.sh \
    --model_dir ./release/alta-base-sft-v1.0 \
    --repo yalilabs/alta-base-sft \
    --version v1.0 --dry_run

After upload, always verify by clearing the cache and loading fresh:

rm -rf ~/.cache/huggingface/hub/models--yalilabs--alta-base-sft
alta-sft chat --model yalilabs/alta-base-sft --revision v1.0

Cutting a runtime package release

The package on PyPI versions independently of model weights. Bump the package version only when the runtime code changes — not when only weights change.

When to bump:

Change	Bump
Bug fix in inference / CLI / server	patch (`0.1.0` → `0.1.1`)
New CLI flag, new optional arg, new public function	minor (`0.1.0` → `0.2.0`)
Removed function, renamed class, changed default behavior	major (`0.1.0` → `1.0.0`)
Breaking change to `config.json` schema	bump `MODEL_FORMAT_MAX` in `_version.py` AND major bump

Steps:

Update src/alta_models_sft/_version.py:
```
__version__ = "0.2.0"
```

Update CHANGELOG.md (top of file):

## [0.2.0] - 2026-06-15
### Added
- Stream support for `alta-sft generate`
### Fixed
- KV cache overflow on 4096-token contexts

Commit, tag, push:

git add . && git commit -m "Release 0.2.0"
git tag v0.2.0
git push origin main --tags

GitHub Actions takes over — .github/workflows/release.yml builds the wheel, verifies training code is excluded, and publishes to PyPI via trusted publishing.

Verify on PyPI:

pip install -U alta-models-sft
alta-sft --version          # should show 0.2.0

Architecture: how training and the package share code

The single most important design decision in this repo: the model architecture is defined exactly once, in src/alta_models_sft/modeling/model.py. Training and inference both import from there.

                       ┌──────────────────────────────────────────┐
                       │  src/alta_models_sft/modeling/model.py  │
                       │  ALTAModel — single definition           │
                       └──────────────────────┬───────────────────┘
                                              │
                  ┌───────────────────────────┼───────────────────────────┐
                  │                           │                           │
                  ▼                           ▼                           ▼
       training/train.py        src/alta_models_sft/inference        external users
       (calls init_weights,     (ALTAChat.from_pretrained)            via `pip install`
        gradient ckpt,           — no init, no training paths
        chunked CE loss)

The model class has both training capabilities (chunked CE loss, weight init, gradient checkpointing toggles) and inference paths (KV-cached generation, safetensors loading). Inference users never invoke the training methods — they're just there, unused.

Why this matters: there's zero possibility of architecture drift between training-time and inference-time code. The shape of every tensor, the order of operations, the special tokens — all guaranteed identical.

Don't add a training_model.py that re-implements parts of the architecture. Don't copy modeling code into training/. If training needs something the model doesn't have, add it to the model class with a flag and document why.

Versioning policy

Two version numbers, kept independent:

Package version (src/alta_models_sft/_version.py → __version__)
- Versions the inference runtime, CLI, server.
- Follows SemVer.
- Released to PyPI.
Model revision (Hugging Face tags: v1.0, v1.1, v2.0-instruct, etc.)
- Versions the actual weights.
- Released to Hugging Face Hub.

The runtime checks the model's model_format_version against its supported range (MODEL_FORMAT_MIN..MODEL_FORMAT_MAX). If incompatible, loading fails with a clear error pointing at the fix.

Rule of thumb: users in production should pin both:

pip install "alta-models-sft==0.1.0"

ALTAChat.from_pretrained("yalilabs/alta-base-sft", revision="v1.0")

Operations

Running CI locally before pushing

ruff check src tests
pytest --cov=alta_models_sft

# Build the wheel and verify training code is NOT included
python -m build
python -m zipfile -l dist/*.whl | grep -E "^(training/|scripts/|tests/)"
# ↑ Should print nothing. If anything prints, fix pyproject.toml.

Common gotchas

DDP runs need torchrun. Plain python -m training.train only uses one GPU even on multi-GPU machines.
Tokenizer/model vocab mismatch. If you change the tokenizer, you must re-pretrain — SFT can't recover from a vocab mismatch.
max_seq_len truncation drops assistant turns. Long multi-turn samples that exceed max_seq_len get truncated from the right, which may remove the supervised target. The dataset logs this; check the filter breakdown.
PyPI is forever. Never re-publish the same version number with different content. If 0.2.0 has a bug, release 0.2.1.
HF Hub tags should also be immutable in practice. Don't re-tag v1.0 — release v1.0.1.

Where to look when things break

Symptom	First place to check
Training crashes immediately	`./sft_output/logs/train_rank0.log` — usually a data-format issue
Training loss stuck high	Tokenizer/vocab mismatch; or `mask_user_tokens` config is wrong
Sample generations are garbage	Try `mask_ablation` test; verify ChatML format matches training
PyPI upload fails	Check `_version.py` matches the git tag; check trusted publishing config
HF upload fails auth	`huggingface-cli whoami` — token may have expired
Model loads on Hub but not locally	Run `python -c "import alta_models_sft; print(alta_models_sft.__version__)"` to verify install

Contacts

Training questions: #alta-training Slack channel
Infra / Hub uploads: #ml-platform
Public releases: tag @releases in #alta-models

License

The runtime package is Apache 2.0 (see LICENSE). Training data, internal benchmarks, and unpublished checkpoints are internal only and must not be checked into this repo.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.1

May 29, 2026

1.1.0

May 29, 2026

This version

1.0.0

May 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alta_models_sft-1.0.0.tar.gz (25.9 kB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alta_models_sft-1.0.0-py3-none-any.whl (29.3 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file alta_models_sft-1.0.0.tar.gz.

File metadata

Download URL: alta_models_sft-1.0.0.tar.gz
Upload date: May 29, 2026
Size: 25.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for alta_models_sft-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`66848bfb1d0740e28b58adea313a88655471f2ce4d1518d751105594d420079e`
MD5	`9ab3f5c629e4c131b0f74377870ec433`
BLAKE2b-256	`c18554ac73ced0d79d73c15e135db692123d4ee7d5773227062e7907d7248f3d`

See more details on using hashes here.

File details

Details for the file alta_models_sft-1.0.0-py3-none-any.whl.

File metadata

Download URL: alta_models_sft-1.0.0-py3-none-any.whl
Upload date: May 29, 2026
Size: 29.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for alta_models_sft-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e8107309c27aab16caebe4aca5de731208802df9c5a49663c6912ced6e924bd0`
MD5	`73e28dd27431e6b1306f3614f4da3f72`
BLAKE2b-256	`edfe0d71b5dddb066de950038c6ce87fc83d6ad59f10385e320083b336be7329`

See more details on using hashes here.

alta-models-sft 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

alta-models-sft (internal)

Contents

What's in this repo

First-time setup

Day-to-day workflows

Training a new SFT model

1. Prepare data

2. Run training

3. Monitor

4. When training finishes

Testing a trained model

Exporting for distribution

Uploading weights to Hugging Face

Cutting a runtime package release

Architecture: how training and the package share code

Versioning policy

Operations

Running CI locally before pushing

Common gotchas

Where to look when things break

Contacts

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes