Skip to main content

Syntrix-Base: CPU-first ML framework with tiny models and deterministic training

Project description

Syntrix‑Base — A Low‑Resource (CPU‑First) Machine Learning Framework

CI CodeQL License: MIT PyPI Python

Train and run modern small models fast on everyday CPUs — simple, transparent, and reproducible.

Highlights

  • CPU‑first ergonomics: pinned threads, deterministic seeds, dtype control
  • Tiny but modern models: GPT‑mini, SSM‑mini, RNN‑mini
  • Reproducible logging: JSONL logs with tokens/sec and environment
  • Optional torch.compile with CLI toggle and auto validation

Requirements

  • Python >= 3.9
  • Linux/macOS (Windows may work via WSL)
  • PyTorch (installed automatically via pip install -e .)

Quickstart

1) Clone and install

git clone https://github.com/paredezadrian/syntrix-base.git
cd syntrix-base
python3 -m venv venv && source venv/bin/activate
pip install --upgrade pip
pip install -e .

Install from PyPI (after publish):

pip install syntrix

2) Get sample data (TinyShakespeare)

mkdir -p data
wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt -O data/tinyshakespeare.txt

Or download a mini text8 sample via CLI when training: add --download.text8_mini.

3) Train (YAML + CLI overrides)

syntrix.train \
  --config configs/gpt-mini.yaml \
  --data.file data/tinyshakespeare.txt \
  --train_steps 300 --eval_every 100 --save_every 150 \
  --threads 4 \
  --out_dir runs/gpt-mini_base

Enable torch.compile and auto‑validate throughput (accept only if >= 5% faster):

syntrix.train \
  --config configs/gpt-mini.yaml \
  --data.file data/tinyshakespeare.txt \
  --threads 4 \
  --train_steps 300 --eval_every 100 --save_every 150 \
  --compile --compile.validate --compile.auto --compile.min_improvement 1.05 \
  --out_dir runs/gpt-mini_compile_auto

Outputs:

  • Checkpoints: runs/<name>/ckpt.pt
  • Logs (JSONL): runs/<name>/log.jsonl with step, val_bpc, tokens_per_s, lr, and an initial env record (Python, PyTorch, threads, dtype, compiled flag).

4) Sample from a checkpoint

syntrix.sample \
  --ckpt runs/gpt-mini_base/ckpt.pt \
  --data.file data/tinyshakespeare.txt \
  --max_new_tokens 200 --temp 0.9

Configuration

  • Base configs live in configs/ (e.g., configs/gpt-mini.yaml).
  • You can override most settings via CLI flags. Some use dot notation (e.g., --data.file, --download.text8_mini).
  • Precision: switch default dtype with --dtype float32|float64. Numeric tests use dtype‑aware tolerances.

CLI Reference

List options:

python -m syntrix.cli_train --help
python -m syntrix.cli_sample --help

Notable flags:

  • --threads <int>: sets torch.set_num_threads and pins MKL/OMP threads
  • --compile: enable torch.compile if available
  • --compile.validate --compile.auto --compile.min_improvement 1.05: benchmark forward throughput and auto‑enable compile only if faster
  • --tokenizer <char|bpe> and --bpe_vocab_size <int>
  • --use_mmap: use memory‑mapped data loader for large files

Reproducibility & Determinism

  • Seeds and threads are initialized consistently via syntrix.utils.seed
  • Logs record environment details (threads, Python, PyTorch, dtype)
  • Tests cover determinism and tolerance‑aware numeric checks

Benchmarks

For reproducible commands and example results tables, see docs/benchmarks.md.

Contributing

We welcome contributions of all kinds: bug fixes, features, docs, and benchmarks.

  • Please read CONTRIBUTING.md for our contribution process, standards, and PR guidelines
  • All participants are expected to follow our CODE_OF_CONDUCT.md

Governance & Support

  • Issues: Use GitHub Issues for bug reports and feature requests. Include OS, Python, and PyTorch versions, steps to reproduce, and expected vs. actual behavior.
  • CI: Pull requests must pass GitHub Actions (pytest on Python 3.10/3.11/3.12).

License

MIT — see LICENSE.

Packaging & Publishing

Versioning Policy

We follow Semantic Versioning (SemVer): MAJOR.MINOR.PATCH.

  • Increase MAJOR for incompatible API changes.
  • Increase MINOR for added functionality in a backward-compatible manner.
  • Increase PATCH for backward-compatible bug fixes.

Release Checklist

  1. Ensure all tests pass locally and in CI.
  2. Update CHANGELOG.md with a new section for the release.
  3. Bump the version in pyproject.toml.
  4. Commit changes and tag the release:
    • git commit -m "chore(release): bump version to X.Y.Z"
    • git tag vX.Y.Z && git push origin main --tags
  5. Build and upload to PyPI:
    • pip install build twine
    • python -m build
    • twine upload dist/*
  6. Create a GitHub Release referencing the tag and the corresponding changelog notes.
  7. Verify README badges (CI, CodeQL, PyPI) render correctly.

Installing from PyPI (post‑publish)

pip install syntrix
# verify CLI entry points
syntrix.train --help
syntrix.sample --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntrix-0.2.0.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syntrix-0.2.0-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file syntrix-0.2.0.tar.gz.

File metadata

  • Download URL: syntrix-0.2.0.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for syntrix-0.2.0.tar.gz
Algorithm Hash digest
SHA256 08d00539230cdb6deaa39675ce49a8eacf1a61785b32962924f96035a6646761
MD5 e1c24134901e004b3bcf21a7d8d360aa
BLAKE2b-256 15011592e09a0cd1901e2c1bcf1121030b97c2a44e5bca0766ab0c6c3acfad5a

See more details on using hashes here.

File details

Details for the file syntrix-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: syntrix-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 29.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for syntrix-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e06219c1c5f07b8bb0a2f1da2f6a91af2331ee5ac42bf21867d3c0de5e705eb1
MD5 f1a3832418a9e9e9d5ff4c57a716c066
BLAKE2b-256 6007311c717d2d8fc79cdfdb094ab9f3705fbd84d59f4b6485de17b05d0b0cd6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page