ALTAModel SFT — instruction-tuned Kinyarwanda language models from YaliLabs.
Project description
alta-models-sft (internal)
Monorepo for the ALTA SFT runtime package and its training pipeline. This README is for internal use — anyone with repo access. The public-facing PyPI README is PYPI_README.md and ships with the wheel.
Confidential. Training scripts, datasets, internal benchmarks, and unpublished checkpoints should never be checked in. See
.gitignorefor excluded paths.
Contents
- What's in this repo
- First-time setup
- Day-to-day workflows
- Architecture: how training and the package share code
- Versioning policy
- Operations
What's in this repo
alta-models-sft/
├── src/alta_models_sft/ ← Runtime package (the only thing shipped to PyPI)
│ ├── modeling/ ← Model architecture (RoPE, GQA, SwiGLU, blocks)
│ ├── inference/ ← ALTAChat, ChatML, sampling, masking
│ ├── hub.py ← Local + Hub model resolution
│ ├── cli.py ← `alta-sft` CLI
│ └── server.py ← FastAPI server (extra dep)
│
├── training/ ← Training pipeline (stays in repo)
│ ├── train.py ← Main training entry point
│ ├── config.py ← All hyperparameters
│ ├── dataset.py ← SFT dataset + ChatML masking + collator
│ ├── builder.py ← Wraps ALTAModel for training
│ ├── checkpoint.py ← TopK manager, save/load
│ ├── distributed.py ← DDP setup
│ ├── deduplicate.py ← MinHash + LSH dedup
│ ├── build_multiturn.py ← Multi-turn synthesis from single-turn data
│ ├── resource_monitor.py ← GPU/CPU/RAM telemetry
│ └── ...
│
├── scripts/ ← Operational tools (never shipped)
│ ├── test_inference.py ← 8-subcommand model tester
│ ├── export_for_release.py ← Training checkpoint → release directory
│ └── upload_to_hub.sh ← Safe Hub upload with validation
│
├── tests/ ← pytest suite
├── .github/workflows/ ← CI: tests + PyPI release
├── pyproject.toml ← Package metadata (controls what ships)
├── README.md ← THIS file (internal)
└── PYPI_README.md ← Public README (gets bundled into the wheel)
Important: The wheel only includes src/alta_models_sft/. The [tool.hatch.build.targets.wheel] section in pyproject.toml enforces this, and CI fails if training/, scripts/, or tests/ leak in.
First-time setup
git clone git@github.com:yalilabs/alta-models-sft.git
cd alta-models-sft
python -m venv .venv
source .venv/bin/activate
pip install -e ".[all]"
Verify everything works:
pytest # all tests should pass
ruff check src tests # lint should be clean
alta-sft --version # CLI installed
python -m training.train --help # training importable
You also need:
huggingface-cli login # for Hub uploads
# Optional: set HF_TOKEN in your shell for non-interactive use
Day-to-day workflows
Training a new SFT model
1. Prepare data
Training data goes in ./data/ (gitignored). Supported per-sample formats — any mix works in one JSONL:
{"question": "...", "answer": "..."}
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"instruction": "...", "input": "...", "output": "..."}
{"document": "...", "summary": "..."}
Recommended preprocessing pipeline:
# 1. Deduplicate (writes <output>.jsonl + .report.txt + .duplicates.jsonl + .stats.json)
python -m training.deduplicate \
--input ./data/raw.jsonl \
--output ./data/clean \
--threshold 0.85
# 2. Synthesize multi-turn samples (helps with conversational coherence)
python -m training.build_multiturn \
--input ./data/clean.jsonl \
--output ./data/training.jsonl \
--multiturn_ratio 0.3 \
--max_chain_length 3
# 3. Hold out a validation split (any way you like)
shuf ./data/training.jsonl | head -1000 > ./data/testing.jsonl
shuf ./data/training.jsonl | tail -n +1001 > ./data/training_split.jsonl
2. Run training
Single GPU:
python -m training.train \
--pretrained_dir ./pretrained/alta_base \
--train_data ./data/training_split.jsonl \
--val_data ./data/testing.jsonl \
--output_dir ./sft_output
Multi-GPU (DDP via torchrun):
torchrun --nproc_per_node=4 -m training.train \
--pretrained_dir ./pretrained/alta_base \
--train_data ./data/training_split.jsonl \
--val_data ./data/testing.jsonl \
--output_dir ./sft_output
Resume from a previous checkpoint:
python -m training.train \
--resume ./sft_output/checkpoints/alta_epoch003_step1500_loss1.8234.pt \
--train_data ./data/training_split.jsonl \
--val_data ./data/testing.jsonl \
--output_dir ./sft_output
Hyperparameters live in training/config.py. Common ones can be overridden via CLI:
python -m training.train ... \
--epochs 5 \
--batch_size 16 \
--target_lr 1e-5 \
--max_seq_len 2048 \
--grad_accum_steps 4
3. Monitor
tensorboard --logdir ./sft_output/tensorboard
tail -f ./sft_output/logs/train_rank0.log
Watch for: val_loss decreasing each epoch, train_loss not diverging from val_loss, sample generations becoming coherent. The expected val_loss range after epoch 1 is in config.py (expected_val_loss_at_epoch_1).
4. When training finishes
train.py automatically calls save_pretrained() on the best model. The output is at ./sft_output/alta_sft_final/:
sft_output/alta_sft_final/
├── config.json # includes model_format_version
└── model.safetensors # safetensors format, ready to distribute
This directory is already in the distribution format — you can load it immediately:
alta-sft chat --model ./sft_output/alta_sft_final
Testing a trained model
scripts/test_inference.py has 8 subcommands. Run from repo root.
# 1. Quick smoke test (3 prompts, <1 min) — ALWAYS run this first
python scripts/test_inference.py smoke --model ./sft_output/alta_sft_final
# 2. Full prompt suite (writes JSON report)
python scripts/test_inference.py suite \
--model ./sft_output/alta_sft_final \
--output ./results/run_$(date +%Y%m%d_%H%M).json
# 3. Interactive REPL for qualitative exploration
python scripts/test_inference.py chat \
--model ./sft_output/alta_sft_final --stream
# 4. Single prompt
python scripts/test_inference.py single \
--model ./sft_output/alta_sft_final \
--prompt "Sobanura amateka y'u Rwanda" --stream
# 5. Multi-turn conversation test (catches memory bugs)
python scripts/test_inference.py multiturn --model ./sft_output/alta_sft_final
# 6. Sampling comparison (same prompt, different configs)
python scripts/test_inference.py compare \
--model ./sft_output/alta_sft_final \
--prompt "Mwiriwe!" \
--configs '[{"temperature":0.3},{"temperature":0.8,"top_p":0.95}]'
# 7. Mask ablation (loads model twice — with/without non-Kinyarwanda mask)
python scripts/test_inference.py mask_ablation \
--model ./sft_output/alta_sft_final --prompt "Bite?"
# 8. Throughput benchmark
python scripts/test_inference.py bench \
--model ./sft_output/alta_sft_final --num_prompts 20 --device cuda --dtype bfloat16
Promotion criteria before releasing a checkpoint publicly:
-
smokepasses (no crashes, non-empty responses) -
suitehas zero crashes; spot-check at least 3 categories of responses look reasonable -
multiturnshows the model uses prior context (doesn't repeat introductions) -
mask_ablationshows the model produces clean Kinyarwanda even without the mask (a real fluency check) -
benchthroughput is within expected range for the target hardware
Exporting for distribution
train.py already saves in the distribution format, so this step is only needed if you want to:
- Bundle a tokenizer into the directory
- Tag the export with a release version string
- Convert an old
.ptcheckpoint to safetensors
python scripts/export_for_release.py \
--checkpoint ./sft_output/alta_sft_final \
--output ./release/alta-base-sft-v1.0 \
--version v1.0 \
--include_tokenizer \
--tokenizer yalilabs/alta-tokenizer
Output:
release/alta-base-sft-v1.0/
├── config.json # with release_version + release_date metadata
├── model.safetensors
├── tokenizer.json # bundled
├── special_tokens_map.json
├── tokenizer_config.json
└── README.md # auto-generated model card
Uploading weights to Hugging Face
Use upload_to_hub.sh — it validates everything (auth, repo existence, load test, tag collision) before uploading.
# Standard release
./scripts/upload_to_hub.sh \
--model_dir ./release/alta-base-sft-v1.0 \
--repo yalilabs/alta-base-sft \
--version v1.0
# First-time release of a new model (creates repo if missing)
./scripts/upload_to_hub.sh \
--model_dir ./release/alta-base-sft-v0.9 \
--repo yalilabs/alta-base-sft \
--version v0.9 \
--private --create_repo
# CI-friendly (no prompts)
./scripts/upload_to_hub.sh \
--model_dir ./release/alta-base-sft-v1.0 \
--repo yalilabs/alta-base-sft \
--version v1.0 --yes
# Dry-run to validate without uploading
./scripts/upload_to_hub.sh \
--model_dir ./release/alta-base-sft-v1.0 \
--repo yalilabs/alta-base-sft \
--version v1.0 --dry_run
After upload, always verify by clearing the cache and loading fresh:
rm -rf ~/.cache/huggingface/hub/models--yalilabs--alta-base-sft
alta-sft chat --model yalilabs/alta-base-sft --revision v1.0
Cutting a runtime package release
The package on PyPI versions independently of model weights. Bump the package version only when the runtime code changes — not when only weights change.
When to bump:
| Change | Bump |
|---|---|
| Bug fix in inference / CLI / server | patch (0.1.0 → 0.1.1) |
| New CLI flag, new optional arg, new public function | minor (0.1.0 → 0.2.0) |
| Removed function, renamed class, changed default behavior | major (0.1.0 → 1.0.0) |
Breaking change to config.json schema |
bump MODEL_FORMAT_MAX in _version.py AND major bump |
Steps:
-
Update
src/alta_models_sft/_version.py:__version__ = "0.2.0"
-
Update
CHANGELOG.md(top of file):## [0.2.0] - 2026-06-15 ### Added - Stream support for `alta-sft generate` ### Fixed - KV cache overflow on 4096-token contexts
-
Commit, tag, push:
git add . && git commit -m "Release 0.2.0" git tag v0.2.0 git push origin main --tags
-
GitHub Actions takes over —
.github/workflows/release.ymlbuilds the wheel, verifies training code is excluded, and publishes to PyPI via trusted publishing. -
Verify on PyPI:
pip install -U alta-models-sft alta-sft --version # should show 0.2.0
Architecture: how training and the package share code
The single most important design decision in this repo: the model architecture is defined exactly once, in src/alta_models_sft/modeling/model.py. Training and inference both import from there.
┌──────────────────────────────────────────┐
│ src/alta_models_sft/modeling/model.py │
│ ALTAModel — single definition │
└──────────────────────┬───────────────────┘
│
┌───────────────────────────┼───────────────────────────┐
│ │ │
▼ ▼ ▼
training/train.py src/alta_models_sft/inference external users
(calls init_weights, (ALTAChat.from_pretrained) via `pip install`
gradient ckpt, — no init, no training paths
chunked CE loss)
The model class has both training capabilities (chunked CE loss, weight init, gradient checkpointing toggles) and inference paths (KV-cached generation, safetensors loading). Inference users never invoke the training methods — they're just there, unused.
Why this matters: there's zero possibility of architecture drift between training-time and inference-time code. The shape of every tensor, the order of operations, the special tokens — all guaranteed identical.
Don't add a training_model.py that re-implements parts of the architecture. Don't copy modeling code into training/. If training needs something the model doesn't have, add it to the model class with a flag and document why.
Versioning policy
Two version numbers, kept independent:
-
Package version (
src/alta_models_sft/_version.py→__version__)- Versions the inference runtime, CLI, server.
- Follows SemVer.
- Released to PyPI.
-
Model revision (Hugging Face tags:
v1.0,v1.1,v2.0-instruct, etc.)- Versions the actual weights.
- Released to Hugging Face Hub.
The runtime checks the model's model_format_version against its supported range (MODEL_FORMAT_MIN..MODEL_FORMAT_MAX). If incompatible, loading fails with a clear error pointing at the fix.
Rule of thumb: users in production should pin both:
pip install "alta-models-sft==0.1.0"
ALTAChat.from_pretrained("yalilabs/alta-base-sft", revision="v1.0")
Operations
Running CI locally before pushing
ruff check src tests
pytest --cov=alta_models_sft
# Build the wheel and verify training code is NOT included
python -m build
python -m zipfile -l dist/*.whl | grep -E "^(training/|scripts/|tests/)"
# ↑ Should print nothing. If anything prints, fix pyproject.toml.
Common gotchas
- DDP runs need
torchrun. Plainpython -m training.trainonly uses one GPU even on multi-GPU machines. - Tokenizer/model vocab mismatch. If you change the tokenizer, you must re-pretrain — SFT can't recover from a vocab mismatch.
max_seq_lentruncation drops assistant turns. Long multi-turn samples that exceedmax_seq_lenget truncated from the right, which may remove the supervised target. The dataset logs this; check the filter breakdown.- PyPI is forever. Never re-publish the same version number with different content. If 0.2.0 has a bug, release 0.2.1.
- HF Hub tags should also be immutable in practice. Don't re-tag
v1.0— releasev1.0.1.
Where to look when things break
| Symptom | First place to check |
|---|---|
| Training crashes immediately | ./sft_output/logs/train_rank0.log — usually a data-format issue |
| Training loss stuck high | Tokenizer/vocab mismatch; or mask_user_tokens config is wrong |
| Sample generations are garbage | Try mask_ablation test; verify ChatML format matches training |
| PyPI upload fails | Check _version.py matches the git tag; check trusted publishing config |
| HF upload fails auth | huggingface-cli whoami — token may have expired |
| Model loads on Hub but not locally | Run python -c "import alta_models_sft; print(alta_models_sft.__version__)" to verify install |
Contacts
- Training questions:
#alta-trainingSlack channel - Infra / Hub uploads:
#ml-platform - Public releases: tag
@releasesin#alta-models
License
The runtime package is Apache 2.0 (see LICENSE). Training data, internal benchmarks, and unpublished checkpoints are internal only and must not be checked into this repo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alta_models_sft-1.1.0.tar.gz.
File metadata
- Download URL: alta_models_sft-1.1.0.tar.gz
- Upload date:
- Size: 25.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ceb98a65a696a1f5b379c51ed3a503093e271442f012c20f3ef90eff0613d38c
|
|
| MD5 |
9ae6a0b464dc484992bf8a1e7b222e33
|
|
| BLAKE2b-256 |
d4f94c9956550cb0e328f02a12b283a4750329b3ae99f4606765d4f8f179e0bc
|
File details
Details for the file alta_models_sft-1.1.0-py3-none-any.whl.
File metadata
- Download URL: alta_models_sft-1.1.0-py3-none-any.whl
- Upload date:
- Size: 29.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f05cea2c773f7d670c8f93713813577bbf652090ffccb5f994441c8320f936e
|
|
| MD5 |
cbaaea0650d259d7d9004a0360020b49
|
|
| BLAKE2b-256 |
04ce9b6c03660a11bd5d464aabffe75ee29936e3760ecf4fc9a3e956523d19cc
|