Skip to main content

Python client for shuflr — HTTP NDJSON and shuflr-wire/1 binary transports

Project description

shuflr-client

Python client for shuflr — streaming shuffled JSONL records over the network for LLM training and analytics.

Rust core with pyo3 bindings, shipped as a single maturin-built abi3 wheel (Python 3.9+). No tokio, no async — blocking sockets compose cleanly with PyTorch's multiprocessing DataLoader workers.

pip install shuflr-client

Usage

import shuflr_client

ds = shuflr_client.Dataset(
    "http://127.0.0.1:9000/v1/streams/corpus",
    seed=42,
    shuffle="chunk-shuffled",   # or "index-perm" for provably uniform
    epochs=0,                   # 0 = infinite
    sample=None,                # or N to cap
    rank=0, world_size=4,       # distributed partitioning, no coordinator
    auth_token=None,            # bearer for protected servers
    tls_ca_cert=None,           # path to PEM bundle for private CAs
)
for record_bytes in ds:
    record = orjson.loads(record_bytes)
    ...

Each __next__ returns one bytes record (no trailing newline). The stream opens lazily on first iter() and closes when the server runs out of records or --sample is exhausted.

PyTorch

from shuflr_client import IterableDataset
import torch

ds = IterableDataset(
    "http://localhost:9000/v1/streams/training",
    seed=42, shuffle="index-perm",
)
loader = torch.utils.data.DataLoader(ds, batch_size=128, num_workers=4)

When invoked from inside a DataLoader worker, IterableDataset reads torch.utils.data.get_worker_info() and auto-fills rank + world_size so each worker reads a disjoint slice of the shuffled stream — no per-worker config required.

A typical training data layer opens many Dataset instances at once (one per source corpus) and weights them with a wrapper layer; in production we sustain ~150 concurrent shuflr streams from a single training process this way.

Transports

Scheme Transport Status
http:// Plain HTTP/1.1 chunked NDJSON shipped
https:// HTTPS (ureq TLS) shipped — tls_ca_cert= for private CAs
shuflr:// shuflr-wire/1 over plain TCP shipped
shuflrs:// shuflr-wire/1 over TLS parses, lands next
shuflr+unix:// shuflr-wire/1 over UDS parses, lands next

HTTP is the universal fallback — works through any proxy, every firewall, every LB. The shuflr-wire/1 transport adds explicit framing and ordered delivery beyond what TCP gives you, plus raw-frame passthrough for chunk-shuffled mode: the server hands the client compressed seekable-zstd frames it then re-shuffles locally with the server-derived seed. Wire size shrinks to ~disk size (~3.6× less than NDJSON) at no quality cost.

Server

A shuflr serve instance hosts one or more named datasets:

shuflr serve --http 127.0.0.1:9000 \
    --dataset corpus=/data/corpus.jsonl.zst \
    --dataset pairs=/data/pairs.jsonl.zst

TLS, bearer / mTLS auth, and reloadable token files are all supported on the same listener. See the shuflr README for the full server-side surface.

Development

cd crates/shuflr-client
python3 -m venv .venv && source .venv/bin/activate
pip install maturin pytest
cargo build -p shuflr-cli --features serve   # tests spawn a real server
maturin develop --release
pytest tests/

Or run the bundled script: crates/shuflr-client/scripts/test.sh.

License

MIT OR Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

shuflr_client-0.1.1-cp39-abi3-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.9+Windows x86-64

shuflr_client-0.1.1-cp39-abi3-manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

shuflr_client-0.1.1-cp39-abi3-manylinux_2_28_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ ARM64

shuflr_client-0.1.1-cp39-abi3-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file shuflr_client-0.1.1-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for shuflr_client-0.1.1-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f2cfb231d3d5659ae252dc543727c33b6720c6a6e66a7d3ed2cb7e5b0ceb7858
MD5 0dff669f3df04308657697fd3e861a23
BLAKE2b-256 fd30dfd9b28bbfbb6ccf2dc3f288518ee8d75718314c8e6808ceee94a441a3df

See more details on using hashes here.

Provenance

The following attestation bundles were made for shuflr_client-0.1.1-cp39-abi3-win_amd64.whl:

Publisher: release-pypi.yml on mjbommar/shuflr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shuflr_client-0.1.1-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for shuflr_client-0.1.1-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 204ecbc751018da5c7f07f327e3733244447ff5323ae6ee99833a869f3bea589
MD5 7c6dddbbfbe7a5656522f317d8264f79
BLAKE2b-256 00744e8324d2970e0f249853ff1fffa20106d5e7a6aeff0070d96d1efb7431f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for shuflr_client-0.1.1-cp39-abi3-manylinux_2_28_x86_64.whl:

Publisher: release-pypi.yml on mjbommar/shuflr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shuflr_client-0.1.1-cp39-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for shuflr_client-0.1.1-cp39-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 677ae2004d6dd968e7888951697b305be48193c6be5b8b0bb5b48f4daa9e2525
MD5 499493dec71bef996efcb0685dcf84ac
BLAKE2b-256 8f1475046828cf9c3afc7c4ea0b9129c805a510e6ed5c5fa1b0d6c75c5a3481a

See more details on using hashes here.

Provenance

The following attestation bundles were made for shuflr_client-0.1.1-cp39-abi3-manylinux_2_28_aarch64.whl:

Publisher: release-pypi.yml on mjbommar/shuflr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shuflr_client-0.1.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for shuflr_client-0.1.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 720ed96b9838f22250594508e3d5f9d545d8364850006213592bacd5c479ed92
MD5 86ac7506074d41c2c65f0b692241510c
BLAKE2b-256 e31cd74d695247d22e4696be14d81bea2b47299316d1f2228bfeaa321f753d05

See more details on using hashes here.

Provenance

The following attestation bundles were made for shuflr_client-0.1.1-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release-pypi.yml on mjbommar/shuflr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page