Manifest-backed real-data ingestion and OpenML materialization for tabular workflows
Project description
tab-realdata-hub
tab-realdata-hub materializes external tabular data sources into the
manifest-backed packed-shard contract consumed by tab-foundry.
tab-realdata-hub is the sole owner of that manifest contract. The parquet
manifest is the stable index layer, and richer evolving dataset/provenance
fields live in metadata.ndjson. Downstream consumers are expected to read
through this package rather than reimplementing compatibility shims.
Install from the upstream git tag with:
python -m pip install "tab-realdata-hub @ git+https://github.com/bensonlee5/tab-realdata-hub.git@v0.1.0"
For repo-local development:
uv sync
The v1 surface is OpenML-first:
- build pinned OpenML bundle JSON from known task pools or live discovery
- materialize bundle tasks into packed shards plus manifest parquet
- inspect manifest-backed datasets through a stable library and CLI surface
Example:
uv sync
tab-realdata-hub bundle build-openml \
--out-path bundles/many_class_v1.json \
--bundle-name many_class_v1 \
--version 1 \
--task-source tabarena_v0_1 \
--max-features 10 \
--max-classes 10 \
--max-missing-pct 10.0
tab-realdata-hub materialize openml-bundle \
--bundle-path bundles/many_class_v1.json \
--out-root outputs/openml/many_class_v1
tab-realdata-hub manifest inspect \
--manifest outputs/openml/many_class_v1/manifest.parquet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tab_realdata_hub-0.1.0.tar.gz.
File metadata
- Download URL: tab_realdata_hub-0.1.0.tar.gz
- Upload date:
- Size: 60.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19d14edac43f059f8e66aa3dbc90b8a9c4177d58ba91c704c52b6859be5e9324
|
|
| MD5 |
196273716381e866724cd8f3a80835bb
|
|
| BLAKE2b-256 |
ca43242bf887443beda3b7c5f9226d99bfd78f7611d4c56f513a118ceaec5d3e
|
Provenance
The following attestation bundles were made for tab_realdata_hub-0.1.0.tar.gz:
Publisher:
publish.yml on bensonlee5/tab-realdata-hub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tab_realdata_hub-0.1.0.tar.gz -
Subject digest:
19d14edac43f059f8e66aa3dbc90b8a9c4177d58ba91c704c52b6859be5e9324 - Sigstore transparency entry: 1193948098
- Sigstore integration time:
-
Permalink:
bensonlee5/tab-realdata-hub@950931d4f30cc6748215016b4700487e416baa21 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/bensonlee5
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@950931d4f30cc6748215016b4700487e416baa21 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tab_realdata_hub-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tab_realdata_hub-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8afcb7911f59343db64bde70393886a0a08f0dc9dde78be4b95d8f60558dd4c4
|
|
| MD5 |
eececaa684a3eb0646423e876eb81ab5
|
|
| BLAKE2b-256 |
207ee9ec231449e07bb624b5e6a55c7bf8b4c72204a26c1ef45cf36226e2af67
|
Provenance
The following attestation bundles were made for tab_realdata_hub-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on bensonlee5/tab-realdata-hub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tab_realdata_hub-0.1.0-py3-none-any.whl -
Subject digest:
8afcb7911f59343db64bde70393886a0a08f0dc9dde78be4b95d8f60558dd4c4 - Sigstore transparency entry: 1193948115
- Sigstore integration time:
-
Permalink:
bensonlee5/tab-realdata-hub@950931d4f30cc6748215016b4700487e416baa21 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/bensonlee5
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@950931d4f30cc6748215016b4700487e416baa21 -
Trigger Event:
push
-
Statement type: