Python client for shuflr — HTTP NDJSON and shuflr-wire/1 binary transports
Project description
shuflr-client
Python client for shuflr — streaming shuffled JSONL records over the network for LLM training and analytics.
Rust core with pyo3 bindings, shipped as a single maturin-built abi3 wheel (Python 3.9+). No tokio, no async — blocking sockets compose cleanly with PyTorch's multiprocessing DataLoader workers.
pip install shuflr-client
Usage
import shuflr_client
ds = shuflr_client.Dataset(
"http://127.0.0.1:9000/v1/streams/corpus",
seed=42,
shuffle="chunk-shuffled", # or "index-perm" for provably uniform
epochs=0, # 0 = infinite
sample=None, # or N to cap
rank=0, world_size=4, # distributed partitioning, no coordinator
auth_token=None, # bearer for protected servers
tls_ca_cert=None, # path to PEM bundle for private CAs
)
for record_bytes in ds:
record = orjson.loads(record_bytes)
...
Each __next__ returns one bytes record (no trailing newline). The
stream opens lazily on first iter() and closes when the server runs
out of records or --sample is exhausted.
PyTorch
from shuflr_client import IterableDataset
import torch
ds = IterableDataset(
"http://localhost:9000/v1/streams/training",
seed=42, shuffle="index-perm",
)
loader = torch.utils.data.DataLoader(ds, batch_size=128, num_workers=4)
When invoked from inside a DataLoader worker, IterableDataset reads
torch.utils.data.get_worker_info() and auto-fills rank +
world_size so each worker reads a disjoint slice of the shuffled
stream — no per-worker config required.
A typical training data layer opens many Dataset instances at once
(one per source corpus) and weights them with a wrapper layer; in
production we sustain ~150 concurrent shuflr streams from a single
training process this way.
Transports
| Scheme | Transport | Status |
|---|---|---|
http:// |
Plain HTTP/1.1 chunked NDJSON | shipped |
https:// |
HTTPS (ureq TLS) | shipped — tls_ca_cert= for private CAs |
shuflr:// |
shuflr-wire/1 over plain TCP |
shipped |
shuflrs:// |
shuflr-wire/1 over TLS |
parses, lands next |
shuflr+unix:// |
shuflr-wire/1 over UDS |
parses, lands next |
HTTP is the universal fallback — works through any proxy, every
firewall, every LB. The shuflr-wire/1 transport adds explicit
framing and ordered delivery beyond what TCP gives you, plus
raw-frame passthrough for chunk-shuffled mode: the server hands
the client compressed seekable-zstd frames it then re-shuffles
locally with the server-derived seed. Wire size shrinks to ~disk size
(~3.6× less than NDJSON) at no quality cost.
Server
A shuflr serve instance hosts one or more named datasets:
shuflr serve --http 127.0.0.1:9000 \
--dataset corpus=/data/corpus.jsonl.zst \
--dataset pairs=/data/pairs.jsonl.zst
TLS, bearer / mTLS auth, and reloadable token files are all supported on the same listener. See the shuflr README for the full server-side surface.
Development
cd crates/shuflr-client
python3 -m venv .venv && source .venv/bin/activate
pip install maturin pytest
cargo build -p shuflr-cli --features serve # tests spawn a real server
maturin develop --release
pytest tests/
Or run the bundled script: crates/shuflr-client/scripts/test.sh.
License
MIT OR Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shuflr_client-0.1.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: shuflr_client-0.1.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6b06ce72965a5dffe6fb8e9375044719779146f45f644111db1f1c7b1fcae5e
|
|
| MD5 |
8c2769091a63887b22eed7d9874da78a
|
|
| BLAKE2b-256 |
b1ca0cbd67846f062106b8d5d707d1f8027ea34484635cc3cbe5003eaa556d3d
|
Provenance
The following attestation bundles were made for shuflr_client-0.1.0-cp39-abi3-win_amd64.whl:
Publisher:
release-pypi.yml on mjbommar/shuflr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shuflr_client-0.1.0-cp39-abi3-win_amd64.whl -
Subject digest:
b6b06ce72965a5dffe6fb8e9375044719779146f45f644111db1f1c7b1fcae5e - Sigstore transparency entry: 1409731088
- Sigstore integration time:
-
Permalink:
mjbommar/shuflr@4cc494297b3d485cb5786f2b8c2c49b74904ec0d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mjbommar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@4cc494297b3d485cb5786f2b8c2c49b74904ec0d -
Trigger Event:
push
-
Statement type:
File details
Details for the file shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.9+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b53f054d41acf067498a84855d208ea1d502d12dc881b8a9d820bb12322fe18
|
|
| MD5 |
68754d44918f9ebab4b1d43928bb441e
|
|
| BLAKE2b-256 |
f2bace981c53e888b96b9052b1c58e5fbc5b6692301f4baa075fc7736717ea50
|
Provenance
The following attestation bundles were made for shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl:
Publisher:
release-pypi.yml on mjbommar/shuflr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_x86_64.whl -
Subject digest:
2b53f054d41acf067498a84855d208ea1d502d12dc881b8a9d820bb12322fe18 - Sigstore transparency entry: 1409731180
- Sigstore integration time:
-
Permalink:
mjbommar/shuflr@4cc494297b3d485cb5786f2b8c2c49b74904ec0d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mjbommar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@4cc494297b3d485cb5786f2b8c2c49b74904ec0d -
Trigger Event:
push
-
Statement type:
File details
Details for the file shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.9+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9541d47b6cdb0629adac61ab1bcc7c642204d8ac375a259d8b601ce2dd165916
|
|
| MD5 |
862b729040d5bb98e6efc12beb0a31ff
|
|
| BLAKE2b-256 |
2acf7ee935ad09593e00a5e96860d8adf69bfe444255543cfee585ac546fc392
|
Provenance
The following attestation bundles were made for shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl:
Publisher:
release-pypi.yml on mjbommar/shuflr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shuflr_client-0.1.0-cp39-abi3-manylinux_2_28_aarch64.whl -
Subject digest:
9541d47b6cdb0629adac61ab1bcc7c642204d8ac375a259d8b601ce2dd165916 - Sigstore transparency entry: 1409731134
- Sigstore integration time:
-
Permalink:
mjbommar/shuflr@4cc494297b3d485cb5786f2b8c2c49b74904ec0d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mjbommar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@4cc494297b3d485cb5786f2b8c2c49b74904ec0d -
Trigger Event:
push
-
Statement type:
File details
Details for the file shuflr_client-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: shuflr_client-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ccfc0f565533e55259d9291dbebd3a3ac8c717ed16402e79513ec7b77dad7bc
|
|
| MD5 |
55d8a7e9bba510848b6cf0950af4fb37
|
|
| BLAKE2b-256 |
da7a72a463f1efc43532539e2eb411acd939425d3cc7eda64b4de68421af3616
|
Provenance
The following attestation bundles were made for shuflr_client-0.1.0-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
release-pypi.yml on mjbommar/shuflr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
shuflr_client-0.1.0-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
6ccfc0f565533e55259d9291dbebd3a3ac8c717ed16402e79513ec7b77dad7bc - Sigstore transparency entry: 1409731030
- Sigstore integration time:
-
Permalink:
mjbommar/shuflr@4cc494297b3d485cb5786f2b8c2c49b74904ec0d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mjbommar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@4cc494297b3d485cb5786f2b8c2c49b74904ec0d -
Trigger Event:
push
-
Statement type: