Syntrix-Base: CPU-first ML framework with tiny models and deterministic training
Project description
Syntrix‑Base — A Low‑Resource (CPU‑First) Machine Learning Framework
Train and run modern small models fast on everyday CPUs — simple, transparent, and reproducible.
Highlights
- CPU‑first ergonomics: pinned threads, deterministic seeds, dtype control
- Tiny but modern models: GPT‑mini, SSM‑mini, RNN‑mini
- Reproducible logging: JSONL logs with tokens/sec and environment
- Optional
torch.compilewith CLI toggle and auto validation
Requirements
- Python >= 3.9
- Linux/macOS (Windows may work via WSL)
- PyTorch (installed automatically via
pip install -e .)
Quickstart
1) Clone and install
git clone https://github.com/paredezadrian/syntrix-base.git
cd syntrix-base
python3 -m venv venv && source venv/bin/activate
pip install --upgrade pip
pip install -e .
Install from PyPI (after publish):
pip install syntrix
2) Get sample data (TinyShakespeare)
mkdir -p data
wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt -O data/tinyshakespeare.txt
Or download a mini text8 sample via CLI when training: add --download.text8_mini.
3) Train (YAML + CLI overrides)
syntrix.train \
--config configs/gpt-mini.yaml \
--data.file data/tinyshakespeare.txt \
--train_steps 300 --eval_every 100 --save_every 150 \
--threads 4 \
--out_dir runs/gpt-mini_base
Enable torch.compile and auto‑validate throughput (accept only if >= 5% faster):
syntrix.train \
--config configs/gpt-mini.yaml \
--data.file data/tinyshakespeare.txt \
--threads 4 \
--train_steps 300 --eval_every 100 --save_every 150 \
--compile --compile.validate --compile.auto --compile.min_improvement 1.05 \
--out_dir runs/gpt-mini_compile_auto
Outputs:
- Checkpoints:
runs/<name>/ckpt.pt - Logs (JSONL):
runs/<name>/log.jsonlwithstep,val_bpc,tokens_per_s,lr, and an initialenvrecord (Python, PyTorch, threads, dtype, compiled flag).
4) Sample from a checkpoint
syntrix.sample \
--ckpt runs/gpt-mini_base/ckpt.pt \
--data.file data/tinyshakespeare.txt \
--max_new_tokens 200 --temp 0.9
Configuration
- Base configs live in
configs/(e.g.,configs/gpt-mini.yaml). - You can override most settings via CLI flags. Some use dot notation (e.g.,
--data.file,--download.text8_mini). - Precision: switch default dtype with
--dtype float32|float64. Numeric tests use dtype‑aware tolerances.
CLI Reference
List options:
python -m syntrix.cli_train --help
python -m syntrix.cli_sample --help
Notable flags:
--threads <int>: setstorch.set_num_threadsand pins MKL/OMP threads--compile: enabletorch.compileif available--compile.validate --compile.auto --compile.min_improvement 1.05: benchmark forward throughput and auto‑enable compile only if faster--tokenizer <char|bpe>and--bpe_vocab_size <int>--use_mmap: use memory‑mapped data loader for large files
Reproducibility & Determinism
- Seeds and threads are initialized consistently via
syntrix.utils.seed - Logs record environment details (threads, Python, PyTorch, dtype)
- Tests cover determinism and tolerance‑aware numeric checks
Benchmarks
For reproducible commands and example results tables, see docs/benchmarks.md.
Contributing
We welcome contributions of all kinds: bug fixes, features, docs, and benchmarks.
- Please read
CONTRIBUTING.mdfor our contribution process, standards, and PR guidelines - All participants are expected to follow our
CODE_OF_CONDUCT.md
Governance & Support
- Issues: Use GitHub Issues for bug reports and feature requests. Include OS, Python, and PyTorch versions, steps to reproduce, and expected vs. actual behavior.
- CI: Pull requests must pass GitHub Actions (pytest on Python 3.10/3.11/3.12).
License
MIT — see LICENSE.
Packaging & Publishing
Versioning Policy
We follow Semantic Versioning (SemVer): MAJOR.MINOR.PATCH.
- Increase MAJOR for incompatible API changes.
- Increase MINOR for added functionality in a backward-compatible manner.
- Increase PATCH for backward-compatible bug fixes.
Release Checklist
- Ensure all tests pass locally and in CI.
- Update
CHANGELOG.mdwith a new section for the release. - Bump the version in
pyproject.toml. - Commit changes and tag the release:
git commit -m "chore(release): bump version to X.Y.Z"git tag vX.Y.Z && git push origin main --tags
- Build and upload to PyPI:
pip install build twinepython -m buildtwine upload dist/*
- Create a GitHub Release referencing the tag and the corresponding changelog notes.
- Verify README badges (CI, CodeQL, PyPI) render correctly.
Installing from PyPI (post‑publish)
pip install syntrix
# verify CLI entry points
syntrix.train --help
syntrix.sample --help
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file syntrix-0.2.0.tar.gz.
File metadata
- Download URL: syntrix-0.2.0.tar.gz
- Upload date:
- Size: 27.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08d00539230cdb6deaa39675ce49a8eacf1a61785b32962924f96035a6646761
|
|
| MD5 |
e1c24134901e004b3bcf21a7d8d360aa
|
|
| BLAKE2b-256 |
15011592e09a0cd1901e2c1bcf1121030b97c2a44e5bca0766ab0c6c3acfad5a
|
File details
Details for the file syntrix-0.2.0-py3-none-any.whl.
File metadata
- Download URL: syntrix-0.2.0-py3-none-any.whl
- Upload date:
- Size: 29.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e06219c1c5f07b8bb0a2f1da2f6a91af2331ee5ac42bf21867d3c0de5e705eb1
|
|
| MD5 |
f1a3832418a9e9e9d5ff4c57a716c066
|
|
| BLAKE2b-256 |
6007311c717d2d8fc79cdfdb094ab9f3705fbd84d59f4b6485de17b05d0b0cd6
|