Skip to main content

Syntrix-Base: CPU-first ML framework with tiny models and deterministic training

Project description

Syntrix‑Base — A Low‑Resource (CPU‑First) Machine Learning Framework

CI CodeQL License: MIT+Commons Clause PyPI Python

Train and run modern small models fast on everyday CPUs — simple, transparent, and reproducible.

Table of Contents

  • Why Syntrix‑Base?
  • Highlights
  • Requirements
  • Quickstart (pip and from source)
  • Usage (Train, Sample, Eval, Config)
  • Configuration overrides
  • Reproducibility & Determinism
  • Benchmarks
  • Troubleshooting
  • Contributing & Governance
  • License

Why Syntrix‑Base?

Syntrix‑Base is a CPU‑first, deterministic learning toolkit for tiny but modern models. It emphasizes clarity over complexity: clean PyTorch code, reproducible logs, and practical CLIs that work well on everyday hardware.

Highlights

  • CPU‑first ergonomics: pinned threads, deterministic seeds, dtype control
  • Tiny but modern models: GPT‑mini, SSM‑mini, RNN‑mini
  • Reproducible logging: JSONL logs with tokens/sec and environment
  • Optional torch.compile with CLI toggle and auto validation

Requirements

  • Python >= 3.9
  • Linux/macOS (Windows may work via WSL)
  • PyTorch (installed automatically via pip install -e .)

Quickstart

Install (official, from PyPI)

pip install syntrix

Alternative: From source (dev install)

git clone https://github.com/paredezadrian/syntrix-base.git
cd syntrix-base
python3 -m venv venv && source venv/bin/activate
pip install --upgrade pip
pip install -e .

2) Get sample data (TinyShakespeare)

mkdir -p data
wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt -O data/tinyshakespeare.txt

Or download a mini text8 sample via CLI when training: add --download.text8_mini.

3) Train (YAML + CLI overrides)

syntrix.train \
  --config configs/gpt-mini.yaml \
  --data.file data/tinyshakespeare.txt \
  --train_steps 300 --eval_every 100 --save_every 150 \
  --threads 4 \
  --out_dir runs/gpt-mini_base

Enable torch.compile and auto‑validate throughput (accept only if >= 5% faster):

syntrix.train \
  --config configs/gpt-mini.yaml \
  --data.file data/tinyshakespeare.txt \
  --threads 4 \
  --train_steps 300 --eval_every 100 --save_every 150 \
  --compile --compile.validate --compile.auto --compile.min_improvement 1.05 \
  --out_dir runs/gpt-mini_compile_auto

Outputs:

  • Checkpoints: runs/<name>/ckpt.pt
  • Logs (JSONL): runs/<name>/log.jsonl with step, val_bpc, tokens_per_s, lr, and an initial env record (Python, PyTorch, threads, dtype, compiled flag).

4) Sample from a checkpoint

syntrix.sample \
  --ckpt runs/gpt-mini_base/ckpt.pt \
  --data.file data/tinyshakespeare.txt \
  --max_new_tokens 200 --temp 0.9

5) Evaluate or validate config

# Evaluate a checkpoint (reports validation BPC)
syntrix.eval --data.file data/tinyshakespeare.txt --ckpt runs/gpt-mini_base/ckpt.pt

# Validate and inspect a YAML config
syntrix.config --config configs/gpt-mini.yaml

Configuration

  • Base configs live in configs/ (e.g., configs/gpt-mini.yaml).
  • You can override most settings via CLI flags. Some use dot notation (e.g., --data.file, --download.text8_mini).
  • Examples:
# Increase layers and reduce batch using dot-notation overrides
syntrix.train --config configs/gpt-mini.yaml --data.file data/tinyshakespeare.txt \
  --model.n_layer 6 --train.batch_size 16
  • Precision: switch default dtype with --dtype float32|float64. Numeric tests use dtype‑aware tolerances.

CLI Reference

List options:

python -m syntrix.cli_train --help
python -m syntrix.cli_sample --help

Notable flags:

  • --threads <int>: sets torch.set_num_threads and pins MKL/OMP threads
  • --compile: enable torch.compile if available
  • --compile.validate --compile.auto --compile.min_improvement 1.05: benchmark forward throughput and auto‑enable compile only if faster
  • --tokenizer <char|bpe> and --bpe_vocab_size <int>
  • --use_mmap: use memory‑mapped data loader for large files

Reproducibility & Determinism

  • Seeds and threads are initialized consistently via syntrix.utils.seed
  • Logs record environment details (threads, Python, PyTorch, dtype)
  • Tests cover determinism and tolerance‑aware numeric checks

Benchmarks

For reproducible commands and example results tables, see docs/benchmarks.md and architecture/FAQ in docs/architecture.md.

Troubleshooting

  • Non‑deterministic results:
    • Ensure --seed and --threads are set; check OMP_NUM_THREADS and MKL_NUM_THREADS.
  • Slow throughput:
    • Use smaller --block_size, small --microbatch with higher --grad_accum, and try --compile --compile.validate --compile.auto.
  • Memory constraints:
    • Use --data.use_mmap for memory‑mapped random block sampling on large files.

Contributing

We welcome contributions of all kinds: bug fixes, features, docs, and benchmarks.

  • Please read CONTRIBUTING.md for our contribution process, standards, and PR guidelines
  • All participants are expected to follow our CODE_OF_CONDUCT.md

Governance & Support

  • Issues: Use GitHub Issues for bug reports and feature requests. Include OS, Python, and PyTorch versions, steps to reproduce, and expected vs. actual behavior.
  • CI: Pull requests must pass GitHub Actions (pytest on Python 3.10/3.11/3.12).

License

MIT with Commons Clause (non‑commercial). See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntrix-0.3.1.tar.gz (28.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syntrix-0.3.1-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file syntrix-0.3.1.tar.gz.

File metadata

  • Download URL: syntrix-0.3.1.tar.gz
  • Upload date:
  • Size: 28.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for syntrix-0.3.1.tar.gz
Algorithm Hash digest
SHA256 28b59c2c644530a48476a6353b88771817aafac4fa882acac341db18cdf4805c
MD5 8184562c5e8a7a516ed082f55f7130bf
BLAKE2b-256 30ac05e39a2ecf6eb7e374c1d3d954c8505da3f5e88d6f940d0182bccac53b9e

See more details on using hashes here.

File details

Details for the file syntrix-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: syntrix-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for syntrix-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf1b6d3e660db1d14753d221667ae18ddbc395d99fe3a8fea037dc33507379d2
MD5 84e651e11a32078579e9c463a439b9c2
BLAKE2b-256 294bd0a0f0ffd0a5a7e065ef79edd99ce6081afe8c93c1eca00ec5370847685b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page