Skip to main content

Python client for shuflr — HTTP NDJSON and shuflr-wire/1 binary transports

Project description

shuflr-client

Python client for shuflr — streaming shuffled JSONL records over the network for LLM training and analytics.

Rust core with pyo3 bindings, shipped as a single maturin-built abi3 wheel (Python 3.9+). No tokio, no async — blocking sockets compose cleanly with PyTorch's multiprocessing DataLoader workers.

pip install shuflr-client

Usage

import shuflr_client

ds = shuflr_client.Dataset(
    "http://127.0.0.1:9000/v1/streams/corpus",
    seed=42,
    shuffle="chunk-shuffled",   # or "index-perm" for provably uniform
    epochs=0,                   # 0 = infinite
    sample=None,                # or N to cap
    rank=0, world_size=4,       # distributed partitioning, no coordinator
    auth_token=None,            # bearer for protected servers
    tls_ca_cert=None,           # path to PEM bundle for private CAs
)
for record_bytes in ds:
    record = orjson.loads(record_bytes)
    ...

Each __next__ returns one bytes record (no trailing newline). The stream opens lazily on first iter() and closes when the server runs out of records or --sample is exhausted.

PyTorch

from shuflr_client import IterableDataset
import torch

ds = IterableDataset(
    "http://localhost:9000/v1/streams/training",
    seed=42, shuffle="index-perm",
)
loader = torch.utils.data.DataLoader(ds, batch_size=128, num_workers=4)

When invoked from inside a DataLoader worker, IterableDataset reads torch.utils.data.get_worker_info() and auto-fills rank + world_size so each worker reads a disjoint slice of the shuffled stream — no per-worker config required.

A typical training data layer opens many Dataset instances at once (one per source corpus) and weights them with a wrapper layer; in production we sustain ~150 concurrent shuflr streams from a single training process this way.

Transports

Scheme Transport Status
http:// Plain HTTP/1.1 chunked NDJSON shipped
https:// HTTPS (ureq TLS) shipped — tls_ca_cert= for private CAs
shuflr:// shuflr-wire/1 over plain TCP shipped
shuflrs:// shuflr-wire/1 over TLS parses, lands next
shuflr+unix:// shuflr-wire/1 over UDS parses, lands next

HTTP is the universal fallback — works through any proxy, every firewall, every LB. The shuflr-wire/1 transport adds explicit framing and ordered delivery beyond what TCP gives you, plus raw-frame passthrough for chunk-shuffled mode: the server hands the client compressed seekable-zstd frames it then re-shuffles locally with the server-derived seed. Wire size shrinks to ~disk size (~3.6× less than NDJSON) at no quality cost.

Server

A shuflr serve instance hosts one or more named datasets:

shuflr serve --http 127.0.0.1:9000 \
    --dataset corpus=/data/corpus.jsonl.zst \
    --dataset pairs=/data/pairs.jsonl.zst

TLS, bearer / mTLS auth, and reloadable token files are all supported on the same listener. See the shuflr README for the full server-side surface.

Development

cd crates/shuflr-client
python3 -m venv .venv && source .venv/bin/activate
pip install maturin pytest
cargo build -p shuflr-cli --features serve   # tests spawn a real server
maturin develop --release
pytest tests/

Or run the bundled script: crates/shuflr-client/scripts/test.sh.

License

MIT OR Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

shuflr_client-0.1.0-cp39-abi3-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.9+Windows x86-64

shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ ARM64

shuflr_client-0.1.0-cp39-abi3-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file shuflr_client-0.1.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for shuflr_client-0.1.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b6b06ce72965a5dffe6fb8e9375044719779146f45f644111db1f1c7b1fcae5e
MD5 8c2769091a63887b22eed7d9874da78a
BLAKE2b-256 b1ca0cbd67846f062106b8d5d707d1f8027ea34484635cc3cbe5003eaa556d3d

See more details on using hashes here.

Provenance

The following attestation bundles were made for shuflr_client-0.1.0-cp39-abi3-win_amd64.whl:

Publisher: release-pypi.yml on mjbommar/shuflr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2b53f054d41acf067498a84855d208ea1d502d12dc881b8a9d820bb12322fe18
MD5 68754d44918f9ebab4b1d43928bb441e
BLAKE2b-256 f2bace981c53e888b96b9052b1c58e5fbc5b6692301f4baa075fc7736717ea50

See more details on using hashes here.

Provenance

The following attestation bundles were made for shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl:

Publisher: release-pypi.yml on mjbommar/shuflr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9541d47b6cdb0629adac61ab1bcc7c642204d8ac375a259d8b601ce2dd165916
MD5 862b729040d5bb98e6efc12beb0a31ff
BLAKE2b-256 2acf7ee935ad09593e00a5e96860d8adf69bfe444255543cfee585ac546fc392

See more details on using hashes here.

Provenance

The following attestation bundles were made for shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl:

Publisher: release-pypi.yml on mjbommar/shuflr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shuflr_client-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for shuflr_client-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6ccfc0f565533e55259d9291dbebd3a3ac8c717ed16402e79513ec7b77dad7bc
MD5 55d8a7e9bba510848b6cf0950af4fb37
BLAKE2b-256 da7a72a463f1efc43532539e2eb411acd939425d3cc7eda64b4de68421af3616

See more details on using hashes here.

Provenance

The following attestation bundles were made for shuflr_client-0.1.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release-pypi.yml on mjbommar/shuflr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page