Generative Retrieval Id Semantic Transforms on top of Google Grain.
Project description
GRIST 🌾
Generative Retrieval ID Semantic Transforms for reproducible data pipelines.
GRIST is a focused Python library for bridging raw research datasets and generative retrieval models. It enriches datasets with Semantic Identifiers, guarantees deterministic preprocessing, and provides helpers for publishing results to public hubs like HuggingFace and Kaggle. It is designed to work smoothly with existing data pipeline tooling, including Grain.
Why GRIST
In Generative Retrieval (GR) research, reproducibility is everything. GRIST treats a dataset not as a static file, but as a deterministic factory. Every transformation, from text cleaning to model-based ID generation, is designed to be perfectly reproducible.
Features
- Pipeline-native: Fits into existing data pipeline tooling without new paradigms to learn.
- Semantic ID injection: Built-in MapTransform classes for UUIDs, hashes, or model-generated codes.
- Inference-ready: Wrap any pre-trained model (HuggingFace, JAX, PyTorch) as an ID generator.
- Publishing helpers: Tools to facilitate uploads to HuggingFace or Kaggle.
Installation
uv add grist
Quick Start
TODO: Quick start example for the planned public API.
Concepts
- Semantic Identifiers: Stable, model-aware IDs that augment dataset samples for generative retrieval.
- Deterministic pipelines: Transform semantics guarantee repeatable preprocessing.
- Dataset configs: Optional, reusable configuration files for well-known datasets.
Why the Name
In milling, grist is the grain separated from its chaff and ready to be ground. This library prepares your "raw grain" (datasets) into a refined format ready for the "mill" of generative retrieval models.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grist-0.1.0.dev0-py3-none-any.whl.
File metadata
- Download URL: grist-0.1.0.dev0-py3-none-any.whl
- Upload date:
- Size: 2.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5cd832d7c0908b0b2885d3ad41b241af137310c955a4c8abcfeb6ba16a32b44
|
|
| MD5 |
09d83c300b6e2beb31c65f5c624b05e4
|
|
| BLAKE2b-256 |
0e90f7355867df0045e7b5e491f96bc8709c38db7e7594418508259d108b6f17
|