zakuro-ai

Zakuro - Distributed computing made simple

These details have not been verified by PyPI

Project links

Project description

zakuro Logo

License Python

Quick Start • Installation • Concepts • Adaptive Compute • Notebooks • Benchmarks • Docs

Zakuro is a context-aware distributed-ML runtime. You decorate a Python function, declare a pool of workers, and the framework routes each call to the worker with the lowest expected time-to-serve — learning from every dispatch, reacting to node failures and performance drift, and never making training wait on things that can be decoupled.

See PRD.md for the product vision and PLAN.md for the measured engineering progress (every number observed, nothing simulated).

Quick start

Run a function on a remote worker — three lines

import zakuro as zk

@zk.fn
def square(x: int) -> int:
    return x * x

# Run on a remote worker (QUIC transport, fast).
result = square.to(zk.Compute(uri="quic://worker:4433"))(7)  # → 49

No worker? Zakuro falls back to running the function in-process so your code keeps working:

# No `uri`, no `host` → standalone fallback; the call runs locally.
result = square.to(zk.Compute())(7)  # → 49

Adapt across a pool — Adam-style allocator

import zakuro as zk

workers = [zk.Worker.spawn(name=f"w{i}") for i in range(3)]
adaptive = zk.AdaptiveCompute(
    workers=[w.compute(verify=False) for w in workers],
    beta1=0.9,
    softmax_temperature=0.02,  # exploration; set 0 for greedy argmin
)
adaptive.warmup(rounds=3)                # auto-calibrate per-worker priors
adaptive.start_health_probes(interval=0.5, max_strikes=2)  # detect drop-outs

@zk.fn
def expensive(x): ...

# The allocator picks the worker with the lowest expected time-to-serve,
# tracks per-worker latency EMA + variance, soft-demotes drifted workers,
# suspends failed ones.
result = expensive.to(adaptive)(42)

Installation

From source (recommended while the 0.3 series is pre-release)

git clone https://github.com/zakuro-ai/zakuro
cd zakuro
uv sync --extra worker         # pulls FastAPI + uvicorn + aioquic for the worker CLI

From PyPI

# Core only (enough to be a client):
pip install zakuro-ai

# Core + worker (needed to run `zakuro-worker`):
pip install 'zakuro-ai[worker]'

Optional extras

extra	adds
`[worker]`	FastAPI, uvicorn, aioquic, psutil — needed to run a worker
`[ray]`	Ray processor
`[dask]`	Dask processor
`[spark]`	PySpark processor
`[dev]`	pytest, ruff, mypy

You do not need [worker] just to call into a running worker — import zakuro stays lean on purpose.

`zc` CLI (optional)

zc is the Rust broker + CLI at zakuro-ai/zc. It's useful for multi-node meshes but not required for a single-process or small-cluster setup.

# macOS / Linux one-liner — installs to /usr/local/bin
curl -sSL https://raw.githubusercontent.com/zakuro-ai/zc/master/scripts/install.sh | bash

Concepts

`Compute`

A target for remote execution.

How it's constructed	Where the call runs
`zk.Compute()`	Standalone — in-process fallback. No network, no workers required.
`zk.Compute(uri="quic://host:4433")`	QUIC worker. Probed at construction; raises `ConnectionError` if unreachable.
`zk.Compute(uri="zakuro://host:3960")`	HTTP worker. Same probe behaviour.
`zk.Compute(uri="ray://head:10001")`	Ray cluster (requires `[ray]`).

Resource hints: cpus=, gpus=, memory=. All advisory on standalone; an explicit memory= refuses to run in standalone because we can't enforce it.

`@zk.fn` / `@zk.cls`

Decorators that turn a function or class into something remotely callable. Under the hood the callable is cloudpickled and shipped to the worker on each call; zk.cls keeps the instance alive on a specific worker for subsequent method calls.

`zk.Worker.spawn()`

Convenience wrapper around the zakuro-worker CLI. Takes name=, transport="http"|"quic", port= (ephemeral if omitted). Polls /health until ready, registers an atexit hook so stray subprocesses die with the calling Python.

Transports

scheme	default port	transport	when to use
`zakuro://`	3960	HTTP (FastAPI + httpx)	interop with any load-balancer / reverse proxy
`quic://`	4433	QUIC (aioquic)	fastest; connection-level resilience built in
`ray://`	10001	Ray	existing Ray cluster
`dask://` / `tcp://`	8786	Dask	existing Dask scheduler
`spark://`	7077	Spark	existing Spark master

QUIC is the default for new work. It handles worker bounces natively (retry + 5 s idle timeout instead of aioquic's 30–60 s default — measured 12× faster detection).

Adaptive compute

zk.AdaptiveCompute is the core differentiator. It tracks per-worker latency with Adam-style EMAs (fast + slow baseline), variance, queue depth, health-probe outcomes, and failure counts — and picks the worker with the lowest expected time-to-serve for every dispatch.

What it does, observed

All numbers come from real subprocess workers running on the Mac (see scripts/bench_*.py):

Feature	Measured behaviour
Warmup	Auto-derives `backpressure_threshold = 1.5 × max(worker p95)` — `29 ms` on the 3-worker Mac mesh. No manual tuning.
Greedy vs softmax routing	Greedy commits to the 3-ms-faster worker (100 %/0 %). Softmax `τ=0.02` keeps all three workers utilised (172/152/176).
Add / remove workers at runtime	Removed worker drops to 0 picks within the batch; readmitted worker starts at mesh-median prior and earns traffic immediately.
Health probes	Background thread detects SIGKILL in 18 ms (tight config) / 743 ms (loose). Worker suspended; traffic reroutes.
Drift detection	Injected 250 ms/call slowdown ⇒ 95 % traffic diversion at t + 0.48 s. Recovery via softmax + health-probe latencies.
QUIC retry	In-flight request during worker SIGKILL surfaces `ConnectionError` in 5 s (vs aioquic's 30–60 s default).

Full numbers in PLAN.md.

Notebooks

Each one runs end-to-end on a laptop and prints observed numbers — no faked data, no simulated failures. Verified by jupyter nbconvert --execute.

notebook	path	what it shows
Standalone	`notebooks/standalone_mode.ipynb`	`Compute()` without a URI → in-process fallback; advisory resource hints; memory-enforcement refusal
Two workers	`notebooks/two_worker_demo.ipynb`	Spawn two workers via `zk.Worker.spawn()`, chain calls between them
Mesh adaptation tour	`notebooks/mesh_adaptation_tour.ipynb`	Every `AdaptiveCompute` knob — warmup, soft/greedy, add/remove, health, drift
QUIC resilience	`notebooks/quic_resilience.ipynb`	Baseline → SIGKILL → respawn; 5 s detection; post-respawn recovery

The sakura repo adds bert_demo/hf_async_features.ipynb — every SakuraHFCallback knob on a real BERT fine-tune.

Benchmarks

All under scripts/ and take a handful of seconds to a minute each. Output is JSON-dumpable via --log <path> where supported.

script	scenario	headline
`bench_mesh_adaptation.py`	2-worker warmup → dispatch → remove → readmit	`29 ms` auto-bp, rebalance within one batch
`bench_health_detection.py`	SIGKILL worker mid-run	`18–743 ms` detection depending on probe cadence
`bench_drift_detection.py`	Induce `sleep(0.25)` on one worker	drift detected `t + 0.48 s`, 95 % traffic diverted
`bench_quic_retry.py`	SIGKILL + respawn on same QUIC port	60 s → 5 s dead-connection detection
`bench_all.py`	Runs the four above in one go	consolidated summary table

uv run python scripts/bench_all.py --log /tmp/zakuro-bench.json

Docs

PRD.md — product vision (what we're building and why)
PLAN.md — engineering plan (what's shipped with measured numbers)
docs/getting-started.md — end-to-end guide, "laptop-only" and "networked" paths
docs/cli.md — zakuro-worker CLI reference
docs/PROTOCOL.md — QUIC wire protocol (so new bindings can implement against it)
docs/zc-quic-patch/ — Rust broker-side QUIC caller, shipped at zakuro-ai/zc#31

Related projects

sakura — ML framework integrations (PyTorch Lightning, HuggingFace Trainer, TensorFlow/Keras) that use Zakuro under the hood to hide eval latency behind training
zc — Rust broker with QUIC transport, credit-based billing, P2P mesh

Development

# Tests
uv run pytest tests/

# Specific benchmark
uv run python scripts/bench_mesh_adaptation.py --n-workers 3

# Build a wheel
task build:wheel

# Full CI set
task ci:all

Python 3.10+ is required. uv is the recommended package manager (pip install uv).

License

BSD-3-Clause. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.6

May 3, 2026

0.2.5

May 3, 2026

0.2.4

Apr 19, 2026

This version

0.2.3

Apr 19, 2026

0.1.1

Jul 16, 2024

0.1.0

Jul 15, 2024

0.0.4a2 pre-release

Feb 27, 2023

0.0.4a1 pre-release

Feb 27, 2023

0.0.4a0 pre-release

Feb 26, 2023

0.0.3a0 pre-release

Jan 17, 2023

0.0.2

Apr 5, 2021

0.0.1

Apr 5, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zakuro_ai-0.2.3.tar.gz (273.2 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zakuro_ai-0.2.3-py3-none-any.whl (67.7 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file zakuro_ai-0.2.3.tar.gz.

File metadata

Download URL: zakuro_ai-0.2.3.tar.gz
Upload date: Apr 19, 2026
Size: 273.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for zakuro_ai-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`26d0930ec5437f645ecf749cf327907842e73cfa0cdaa9b91f611f33760d321a`
MD5	`a97f1b9f8e6ef1281830963604df374e`
BLAKE2b-256	`ac0a56f65ad1430a6fae236f2d19c761a1b5b87d5454c63ad2d36bbcd8ef6f07`

See more details on using hashes here.

File details

Details for the file zakuro_ai-0.2.3-py3-none-any.whl.

File metadata

Download URL: zakuro_ai-0.2.3-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 67.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for zakuro_ai-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`616835478c975566f0f4665d91f70751ebcc593816e6ad50785139c32881ebfb`
MD5	`2b1053e7c8e672a8c4e7596bc9f98d50`
BLAKE2b-256	`953e3fb96c8290dba7f4a642408341beb1b6d79ff17f39588bed95e8a9ec2aaa`

See more details on using hashes here.

zakuro-ai 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quick start

Run a function on a remote worker — three lines

Adapt across a pool — Adam-style allocator

Installation

From source (recommended while the 0.3 series is pre-release)

From PyPI

Optional extras

zc CLI (optional)

Concepts

Compute

@zk.fn / @zk.cls

zk.Worker.spawn()

Transports

Adaptive compute

What it does, observed

Notebooks

Benchmarks

Docs

Related projects

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`zc` CLI (optional)

`Compute`

`@zk.fn` / `@zk.cls`

`zk.Worker.spawn()`