Ondine - The LLM Dataset Engine. SDK for processing tabular datasets using LLMs with reliability, observability, and cost control
Project description
Ondine
A prompt is a column. A new DataFrame primitive for LLMs, with five dimensions of production support.
ondine.dev · Docs · PyPI
Ondine makes LLM calls a first-class DataFrame operation. Define a column with natural language. Ondine computes it at production scale.
from ondine import PipelineBuilder
df = (
PipelineBuilder.create()
.from_dataframe(df, input_columns=["review"], output_columns=["sentiment"])
.with_prompt("Classify the tone of: {review}")
.with_llm(provider="openai", model="gpt-5.4-mini")
.build()
.execute().data
)
The LLM stops being a service you call from your pipeline. It becomes a column function inside it.
Everything else in this README is how Ondine makes that primitive production-true across five dimensions: richer inputs (KB/RAG/OCR), constrained outputs (schemas, grounding), reliable execution (checkpoints, budget caps, adaptive concurrency), full observability, and any LLM backend.
Install
pip install ondine
Python 3.10+. Works with any LLM through LiteLLM: OpenAI, Anthropic, Groq, Mistral, Cerebras, Ollama, MLX, vLLM, SGLang, 100+ others.
30-second quickstart
from ondine import PipelineBuilder
pipeline = (
PipelineBuilder.create()
.from_csv("reviews.csv",
input_columns=["review"],
output_columns=["sentiment", "topic"])
.with_prompt("Classify sentiment and extract the key topic from: {review}")
.with_llm(provider="openai", model="gpt-5.4-mini")
.with_max_budget(5.00)
.build()
)
result = pipeline.execute()
print(f"Processed {result.metrics.processed_rows} rows · ${result.costs.total_cost:.2f}")
One builder chain: input columns, prompt, model, budget cap. Multi-column outputs get a JSON parser; schema enforcement, checkpointing, and cost tracking are on by default.
Prefer a one-liner? QuickPipeline.create(...) wraps the same builder with sensible defaults (see examples/).
The 5 dimensions
1. INPUTS: make the prompt richer
Feed the LLM more than raw column text. Pull context from documents, images, and prior runs.
- Knowledge Base (RAG): ingest PDFs, Markdown, HTML, images via OCR. Hybrid BM25 + dense search with optional cross-encoder reranker. HyDE / multi-query / step-back query transforms.
- OCR: three pluggable backends: multimodal Vision LLM, Tesseract (offline), DocTR.
- Multi-column placeholders: use any number of input columns in one prompt (
{col_a},{col_b}). - Jinja2 templates + system prompts for richer prompt shaping.
2. OUTPUTS: constrain what comes back
Stop parsing strings. Get typed columns, validated against your schema, verified against your evidence.
- Pydantic structured output: define a model, get typed columns back. Malformed JSON auto-retries up to 3x.
- Multi-column parsing: one prompt → N typed columns.
- Grounding verification (Context Store): each LLM answer checked against an evidence graph built from your dataset. Rust + SQLite + FTS5 backend. Contradictions flagged, not silently returned.
3. EXECUTION: run N rows reliably
Production plumbing that df.apply() doesn't give you.
- Checkpointing to Parquet after every batch. Durable SQLite response cache for crash-atomic resume (A4, #144).
- Hard budget caps: pre-run cost estimation, live tracking, halts the pipeline at your USD limit.
- Multi-row batching: pack N rows per API call. 200 calls instead of 10,000 at
batch_size=50. - Prefix caching: system prompt cached across batches. 40–50% token savings.
- Adaptive concurrency: Netflix Gradient2 algorithm. Shrinks on 429, grows on saturation.
- Retry-After parsing across 5 header shapes (OpenAI / Anthropic / Groq / RFC 7231 / ms-delta).
- Distributed rate limiting via Redis (atomic Lua token bucket, cluster-aware).
4. OBSERVATION: see what happened
On by default. Integrates with the observability stack you already run.
- ProgressBar + Logging + CostTracking observers active on every run.
- Langfuse for LLM trace logging.
- OpenTelemetry for distributed tracing.
- Prometheus metrics export (request count, duration histogram, cost gauge).
- Decimal precision for cost tracking (no floating-point surprises).
5. PROVIDERS: any LLM backend
- 100+ providers via LiteLLM. Swap with a string.
- Router with latency-based failover and automatic provider selection.
- Local inference: Ollama, MLX (Apple Silicon), vLLM, SGLang.
- Azure Managed Identity with 3 auth patterns (MI, API key, pre-fetched token).
- Custom endpoints: any OpenAI-compatible API.
Beyond the quickstart
from ondine import PipelineBuilder
from ondine.knowledge import KnowledgeStore
from ondine.context import RustContextStore
from pydantic import BaseModel
class ReviewAnalysis(BaseModel):
sentiment: str
score: int
topic: str
kb = KnowledgeStore("knowledge.db")
kb.ingest("docs/") # PDFs, MD, HTML, images via OCR
pipeline = (
PipelineBuilder.create()
.from_csv("reviews.csv",
input_columns=["review"],
output_columns=["sentiment", "score", "topic"])
.with_knowledge_base(kb, top_k=5, rerank=True, query_transform="hyde")
.with_prompt("Context:\n{_kb_context}\n\nAnalyze: {review}")
.with_llm(provider="openai", model="gpt-5.4-mini")
.with_structured_output(ReviewAnalysis)
.with_context_store(RustContextStore("evidence.db"))
.with_grounding(threshold=0.3)
.with_batch_size(50)
.with_max_budget(25.00)
.with_checkpoint_interval(100)
.with_disk_cache(".cache")
.with_router(strategy="latency")
.with_observer("langfuse")
.build()
)
result = pipeline.execute()
Every chained method maps to one of the five dimensions. See docs.ondine.dev for the full reference.
What "a prompt is a column" unlocks
Same primitive. The use case lives in the prompt.
| Transform | Prompt pattern |
|---|---|
| Classification | "Classify {text} into one of {labels}" |
| Extraction | "Extract name, date, amount from: {document}" |
| Scoring | "Score {item} against {criteria} on 1–10" |
| Comparison | "Is {a} equivalent to {b}? Return yes/no + reason." |
| Translation | "Translate {text} from {src_lang} to {tgt_lang}" |
| Summarization | "Summarize {document} in 3 bullets" |
One abstraction. Any transform.
Compared to alternatives
| Tool | Primitive | Why pick Ondine |
|---|---|---|
| Instructor | f(prompt) → Pydantic (one call) |
Ondine applies that pattern to N rows, with the 5 dimensions |
| Pandas-AI | df.chat("question") |
Different primitive (query vs. compute) |
| LangChain batch | chain.batch([...]) |
No budget cap, no grounding, no observability defaults |
| OpenAI/Anthropic Batch API | Provider-specific batch | No multi-provider, no grounding, no crash-safety, 24-hour turnaround |
| Airflow/Prefect/Dagster | Workflow orchestrators | Heavy setup, no LLM-specific features. Ondine ships integrations for them. |
| Ondine | Prompt(columns) → new_columns |
A primitive, not a wrapper |
Local inference
from ondine import QuickPipeline
# Ollama
pipeline = QuickPipeline.create(
data="reviews.csv",
prompt="Classify sentiment: {review}",
output_columns=["sentiment"],
model="ollama/qwen3.5",
)
# MLX (Apple Silicon, native; no server process)
pipeline = QuickPipeline.create(
data="reviews.csv",
prompt="Classify sentiment: {review}",
output_columns=["sentiment"],
model="mlx/mlx-community/Llama-4-Scout-Instruct-4bit",
)
No API keys. No telemetry. Fully offline.
Documentation
- ondine.dev: landing page + examples
- docs.ondine.dev: full reference, Context Store internals, Builder API, Airflow/Prefect integrations
- examples/: 27 runnable scripts covering every major use case
- CHANGELOG.md: release notes
Contributing
PRs welcome. See CONTRIBUTING.md. Code style: Black + Ruff. Tests required for new features.
License
MIT. See LICENSE.
Acknowledgments
- LiteLLM: provider routing layer
- Instructor: the single-call pattern Ondine applies at DataFrame scale
- The Pydantic team: validation backbone
Support
- Issues: https://github.com/ptimizeroracle/ondine/issues
- Discussions: https://github.com/ptimizeroracle/ondine/discussions
- Website: https://ondine.dev
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ondine-1.10.0rc1-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
593e069622c93e7ce0d1ba17309f360a7850c10d1b9c0d7587f68b7fe943ccff
|
|
| MD5 |
f3676ef8879e3718cbd51af1e7e36ef4
|
|
| BLAKE2b-256 |
0a4bb90ad8d8ed43e0fb33763c5ed6d4ac898da1789d4ee67e30a16a6abf4a6a
|
File details
Details for the file ondine-1.10.0rc1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5f61efdebb6c01a92a589f4f6efd028555b34153d4470fb2e9894692aa25416
|
|
| MD5 |
f71abd484fcae38e816c30cd413fbea9
|
|
| BLAKE2b-256 |
186f0bb7a3cfc104b2f2dbf38f4281cf1ef0e82a3d7abcee63050f5d158580d2
|
File details
Details for the file ondine-1.10.0rc1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d53d47922e477eeb4cbaa94d1d17ac8a8ac74907b02ec5786eb79ba9d885f74
|
|
| MD5 |
5206ba42b76e33ff85de8545b7ef5735
|
|
| BLAKE2b-256 |
ef1f256cf7babd02fedf3574136c3ed8ac63652004f28feef49b80d5712f15a1
|
File details
Details for the file ondine-1.10.0rc1-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 3.0 MB
- Tags: CPython 3.13, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f88adc901288bbaabf633b4d827eac97791367df56a935332c346ddd19f0ef15
|
|
| MD5 |
0162bd6e45ea9469890cc3507a61d760
|
|
| BLAKE2b-256 |
8f2592125d15254398d80a14441e548619a80ce0fa731d4150d2fac7fc2c23b9
|
File details
Details for the file ondine-1.10.0rc1-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d9e4d16744a60ef7fbff65a17d20b9983074b1b00b370884ad26078b9656b4d
|
|
| MD5 |
4167c87191f38e9e0399f6680620e822
|
|
| BLAKE2b-256 |
8db00ec21c23eccbed7e8ce4545739bf4eb20de4f316967a06149853655014f8
|
File details
Details for the file ondine-1.10.0rc1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00c441714394355847de7faec3c6de3edd0b06a1c3beb1fe8cc8d9906d2d66bc
|
|
| MD5 |
5b24a2a149f60df42141f6706487486e
|
|
| BLAKE2b-256 |
9e936557efce6a02ce85155ad94c32204a493e181137df699526344930ee546d
|
File details
Details for the file ondine-1.10.0rc1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5eb21a94faecc27cae87aa3966121c65d71349c490a027699435227a767d9a6
|
|
| MD5 |
b11d54d6b1fe761c340998012165ead9
|
|
| BLAKE2b-256 |
31753f769deea1bb8e712ef8cc95275557994555902d7476c5644e2c85ddec28
|
File details
Details for the file ondine-1.10.0rc1-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 3.0 MB
- Tags: CPython 3.12, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
582730ff2ab6700efa9c4a692f9ba0bfd66cba1f46423cd0b80c35f77820f98a
|
|
| MD5 |
8f7eed0be1d9e572d10bba93c40d0139
|
|
| BLAKE2b-256 |
aed8746c9607a48d67a2ab7463ce1d8e617110af672908f826500246b6d2254d
|
File details
Details for the file ondine-1.10.0rc1-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
886d93d24b66df9a5ef21b0efe48eb39a9f1d09089c05b9944b4716c33fcaf5b
|
|
| MD5 |
b129a7f1b4490ebb492e6ea7b6c2d52f
|
|
| BLAKE2b-256 |
bebabfaa160a8da3ae04be08af369454c1e250a33b32e018b43703fb94d7511c
|
File details
Details for the file ondine-1.10.0rc1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c23a6a4f2e296eb3108ddddfff9d02fdade64e8b06786b2111930b8dcca25d2
|
|
| MD5 |
2d52a28c8a8e185537899ac3afb8f5c1
|
|
| BLAKE2b-256 |
d9bee8c3df1565dffad37127bfecb8618efcb753826f6f6ca6fea168a9b6b5cf
|
File details
Details for the file ondine-1.10.0rc1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
893864a6af2fa15bfe05c225c935df8cf349106e63d67e40a97dac31bd86c135
|
|
| MD5 |
e036619ed16360fb40d58c404005057c
|
|
| BLAKE2b-256 |
cc48f2d1af120f9e28e09597e312155f98a17c18fe9889a3ca4895617fdff25e
|
File details
Details for the file ondine-1.10.0rc1-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 3.0 MB
- Tags: CPython 3.11, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61de721fdfd75c56df7bea7b6f2578eec93bb41817edb8bf7eba2470c956fd1d
|
|
| MD5 |
dff7994ee1f4a004bdeb81cca7067367
|
|
| BLAKE2b-256 |
304d1f4d16911a3007a54bb10ed211830d225ba32aa3ab96fda1d33a1aabd711
|
File details
Details for the file ondine-1.10.0rc1-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db7a1466e23eab2450672dab98c4c916656259aa52a51d63ec0dc36bac9a178d
|
|
| MD5 |
a95484d9ab21b0ed1cbc9d7be901fb1a
|
|
| BLAKE2b-256 |
b0272d942031262e5cbffac3e48a5c4cddf14920d0864da638d0e67d848d6c29
|
File details
Details for the file ondine-1.10.0rc1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f40365346da16c614503bf80301d15c45008620fb47537584b7cd515e59c55e4
|
|
| MD5 |
602531b595402da0fe051f23b3ac9947
|
|
| BLAKE2b-256 |
a158be1a5d9c9f15cfa2b0895c50d1426524f381c195266cb5a78b3ee183d0a2
|
File details
Details for the file ondine-1.10.0rc1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31a95faac55396a51ca115b4d7cc6606f5c17f6b8dc26ff5709d18735ccf3056
|
|
| MD5 |
baf836ca66e1bedf0fd0ea74cd0730ae
|
|
| BLAKE2b-256 |
bb75fdfb8205beb5b9278254593ed8d55c2d1a4b6dc063baa5859801658fc3a2
|
File details
Details for the file ondine-1.10.0rc1-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: ondine-1.10.0rc1-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 3.0 MB
- Tags: CPython 3.10, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac1b6003ec186171fa4a074e8e139e7333f1ce0b0900b2f5faf8017a90442a4f
|
|
| MD5 |
2fa0c63defe07d909c989410b1d01f0e
|
|
| BLAKE2b-256 |
84ae348ecd7500f5892954b3ba799129ce597cb0a49acd2c011ceeb3f8f1213c
|