Skip to main content

Ondine - The LLM Dataset Engine. SDK for processing tabular datasets using LLMs with reliability, observability, and cost control

Project description

Ondine Logo

Ondine

A prompt is a column. A new DataFrame primitive for LLMs, with five dimensions of production support.

PyPI version Downloads License: MIT Python 3.10+ GitHub stars Tests

ondine.dev · Docs · PyPI

Ondine Demo

Ondine makes LLM calls a first-class DataFrame operation. Define a column with natural language. Ondine computes it at production scale.

from ondine import PipelineBuilder

df = (
    PipelineBuilder.create()
    .from_dataframe(df, input_columns=["review"], output_columns=["sentiment"])
    .with_prompt("Classify the tone of: {review}")
    .with_llm(provider="openai", model="gpt-5.4-mini")
    .build()
    .execute().data
)

The LLM stops being a service you call from your pipeline. It becomes a column function inside it.

Everything else in this README is how Ondine makes that primitive production-true across five dimensions: richer inputs (KB/RAG/OCR), constrained outputs (schemas, grounding), reliable execution (checkpoints, budget caps, adaptive concurrency), full observability, and any LLM backend.

Install

pip install ondine

Python 3.10+. Works with any LLM through LiteLLM: OpenAI, Anthropic, Groq, Mistral, Cerebras, Ollama, MLX, vLLM, SGLang, 100+ others.

30-second quickstart

from ondine import PipelineBuilder

pipeline = (
    PipelineBuilder.create()
    .from_csv("reviews.csv",
              input_columns=["review"],
              output_columns=["sentiment", "topic"])
    .with_prompt("Classify sentiment and extract the key topic from: {review}")
    .with_llm(provider="openai", model="gpt-5.4-mini")
    .with_max_budget(5.00)
    .build()
)

result = pipeline.execute()
print(f"Processed {result.metrics.processed_rows} rows · ${result.costs.total_cost:.2f}")

One builder chain: input columns, prompt, model, budget cap. Multi-column outputs get a JSON parser; schema enforcement, checkpointing, and cost tracking are on by default.

Prefer a one-liner? QuickPipeline.create(...) wraps the same builder with sensible defaults (see examples/).

The 5 dimensions

1. INPUTS: make the prompt richer

Feed the LLM more than raw column text. Pull context from documents, images, and prior runs.

  • Knowledge Base (RAG): ingest PDFs, Markdown, HTML, images via OCR. Hybrid BM25 + dense search with optional cross-encoder reranker. HyDE / multi-query / step-back query transforms.
  • OCR: three pluggable backends: multimodal Vision LLM, Tesseract (offline), DocTR.
  • Multi-column placeholders: use any number of input columns in one prompt ({col_a}, {col_b}).
  • Jinja2 templates + system prompts for richer prompt shaping.

2. OUTPUTS: constrain what comes back

Stop parsing strings. Get typed columns, validated against your schema, verified against your evidence.

  • Pydantic structured output: define a model, get typed columns back. Malformed JSON auto-retries up to 3x.
  • Multi-column parsing: one prompt → N typed columns.
  • Grounding verification (Context Store): each LLM answer checked against an evidence graph built from your dataset. Rust + SQLite + FTS5 backend. Contradictions flagged, not silently returned.

3. EXECUTION: run N rows reliably

Production plumbing that df.apply() doesn't give you.

  • Checkpointing to Parquet after every batch. Durable SQLite response cache for crash-atomic resume (A4, #144).
  • Hard budget caps: pre-run cost estimation, live tracking, halts the pipeline at your USD limit.
  • Multi-row batching: pack N rows per API call. 200 calls instead of 10,000 at batch_size=50.
  • Prefix caching: system prompt cached across batches. 40–50% token savings.
  • Adaptive concurrency: Netflix Gradient2 algorithm. Shrinks on 429, grows on saturation.
  • Retry-After parsing across 5 header shapes (OpenAI / Anthropic / Groq / RFC 7231 / ms-delta).
  • Distributed rate limiting via Redis (atomic Lua token bucket, cluster-aware).

4. OBSERVATION: see what happened

On by default. Integrates with the observability stack you already run.

  • ProgressBar + Logging + CostTracking observers active on every run.
  • Langfuse for LLM trace logging.
  • OpenTelemetry for distributed tracing.
  • Prometheus metrics export (request count, duration histogram, cost gauge).
  • Decimal precision for cost tracking (no floating-point surprises).

5. PROVIDERS: any LLM backend

  • 100+ providers via LiteLLM. Swap with a string.
  • Router with latency-based failover and automatic provider selection.
  • Local inference: Ollama, MLX (Apple Silicon), vLLM, SGLang.
  • Azure Managed Identity with 3 auth patterns (MI, API key, pre-fetched token).
  • Custom endpoints: any OpenAI-compatible API.

Beyond the quickstart

from ondine import PipelineBuilder
from ondine.knowledge import KnowledgeStore
from ondine.context import RustContextStore
from pydantic import BaseModel

class ReviewAnalysis(BaseModel):
    sentiment: str
    score: int
    topic: str

kb = KnowledgeStore("knowledge.db")
kb.ingest("docs/")   # PDFs, MD, HTML, images via OCR

pipeline = (
    PipelineBuilder.create()
    .from_csv("reviews.csv",
              input_columns=["review"],
              output_columns=["sentiment", "score", "topic"])
    .with_knowledge_base(kb, top_k=5, rerank=True, query_transform="hyde")
    .with_prompt("Context:\n{_kb_context}\n\nAnalyze: {review}")
    .with_llm(provider="openai", model="gpt-5.4-mini")
    .with_structured_output(ReviewAnalysis)
    .with_context_store(RustContextStore("evidence.db"))
    .with_grounding(threshold=0.3)
    .with_batch_size(50)
    .with_max_budget(25.00)
    .with_checkpoint_interval(100)
    .with_disk_cache(".cache")
    .with_router(strategy="latency")
    .with_observer("langfuse")
    .build()
)

result = pipeline.execute()

Every chained method maps to one of the five dimensions. See docs.ondine.dev for the full reference.

What "a prompt is a column" unlocks

Same primitive. The use case lives in the prompt.

Transform Prompt pattern
Classification "Classify {text} into one of {labels}"
Extraction "Extract name, date, amount from: {document}"
Scoring "Score {item} against {criteria} on 1–10"
Comparison "Is {a} equivalent to {b}? Return yes/no + reason."
Translation "Translate {text} from {src_lang} to {tgt_lang}"
Summarization "Summarize {document} in 3 bullets"

One abstraction. Any transform.

Compared to alternatives

Tool Primitive Why pick Ondine
Instructor f(prompt) → Pydantic (one call) Ondine applies that pattern to N rows, with the 5 dimensions
Pandas-AI df.chat("question") Different primitive (query vs. compute)
LangChain batch chain.batch([...]) No budget cap, no grounding, no observability defaults
OpenAI/Anthropic Batch API Provider-specific batch No multi-provider, no grounding, no crash-safety, 24-hour turnaround
Airflow/Prefect/Dagster Workflow orchestrators Heavy setup, no LLM-specific features. Ondine ships integrations for them.
Ondine Prompt(columns) → new_columns A primitive, not a wrapper

Local inference

from ondine import QuickPipeline

# Ollama
pipeline = QuickPipeline.create(
    data="reviews.csv",
    prompt="Classify sentiment: {review}",
    output_columns=["sentiment"],
    model="ollama/qwen3.5",
)

# MLX (Apple Silicon, native; no server process)
pipeline = QuickPipeline.create(
    data="reviews.csv",
    prompt="Classify sentiment: {review}",
    output_columns=["sentiment"],
    model="mlx/mlx-community/Llama-4-Scout-Instruct-4bit",
)

No API keys. No telemetry. Fully offline.

Documentation

Contributing

PRs welcome. See CONTRIBUTING.md. Code style: Black + Ruff. Tests required for new features.

License

MIT. See LICENSE.

Acknowledgments

  • LiteLLM: provider routing layer
  • Instructor: the single-call pattern Ondine applies at DataFrame scale
  • The Pydantic team: validation backbone

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ondine-1.10.1.tar.gz (255.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ondine-1.10.1-cp313-cp313-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.13Windows x86-64

ondine-1.10.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

ondine-1.10.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

ondine-1.10.1-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

ondine-1.10.1-cp312-cp312-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.12Windows x86-64

ondine-1.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

ondine-1.10.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

ondine-1.10.1-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

ondine-1.10.1-cp311-cp311-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.11Windows x86-64

ondine-1.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

ondine-1.10.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

ondine-1.10.1-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

ondine-1.10.1-cp310-cp310-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.10Windows x86-64

ondine-1.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

ondine-1.10.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

ondine-1.10.1-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.10macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file ondine-1.10.1.tar.gz.

File metadata

  • Download URL: ondine-1.10.1.tar.gz
  • Upload date:
  • Size: 255.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.1.tar.gz
Algorithm Hash digest
SHA256 fef26d8c90e2b586109e2d0289a8e879e16a35b788580fad65a9e4c108c9b81f
MD5 08a71d632ac5070e26e66731ccbcccb3
BLAKE2b-256 11e6580b7be77408e972ee01e24fc52ef4d5f173cc04464fd80af7e1a80566ab

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 47960314159c7ee8569fedd1bb2260e33a6af27f78038550bc08bc0d0d42a574
MD5 f703b93faa1af741585c001d62735cd5
BLAKE2b-256 51caa079277228e1cc030af23d892e874a75ea0348598e2799aea69b349045ac

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6addb89e046c6933084a62ca9805c7ee52268c64f6fe1451c6bef089e2a70223
MD5 8569f1bf2e89c3c9927bd3b1a65884c3
BLAKE2b-256 b1e1a6a8e4af0251a2f6b4215fb235136e1b4bd174413414e6141e155571f54e

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 469cba40b7e144c8523674bbb0ae9569862c05cdb2345d8804cfabdd3a0e4f77
MD5 30890b0d95f042bd3dac058ab3ea9bc0
BLAKE2b-256 6c2240a2dd44e2093634cd2554273e1d30d2863634b30f3d8d9ad3093e74bea4

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 fce3139cbb9dfb49101322fd1c99d0e644b378dd1680a163a47b65290a29c2ec
MD5 2a33d04646cb8aeda34e6f9afc6696b8
BLAKE2b-256 3cb1b34560811256bd93c455a0f793dbf8c2ba91669fadff8d4468284abf0f72

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ec1330b420524d76856d44dd505022e1ed01c6371f9538dda2acfc6bbb505213
MD5 3262a4287e4f38c2302b402c82a32d92
BLAKE2b-256 eace816bf82284e3a318fe6b9cf20396a71d1a147d9560225d556068e1a341ef

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dc00f397429923833039aa58ccce940a575ff433f81046c8652880aeb0b390bd
MD5 8de378dfb9635f6f0a8fc601c9053bce
BLAKE2b-256 04999d7d92736cdd83e0a5c8abf1fe19cd2b312b23c0997b3eb0abe1a2e586d5

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9aa29bcc71c85e7e03af1d819f5597c0a91cf950d92875f6870ee3ceb44ace9d
MD5 3688d702f04d0fe04710cf14020feb60
BLAKE2b-256 31226328ebc49ac5bdd15cd2d98f11ea4da122d2e2b4317f3eda2ffe14043f7c

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 0664b349fbad36e8cf6f7ece1aa59d99c250b670849790123bb699d70f958d7d
MD5 364158419a04361a5effd821b78c4d82
BLAKE2b-256 cb6df25f70c9ab93b65aed3284973a33ab6aabb6e5c2d971151366591bcb2f22

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 797b969282bdc68211e895b77f0a98ace9418bb3ff89be3724d8bd5202e53e98
MD5 9b88da6a0664acf8f5a966f47c7710a4
BLAKE2b-256 983cdd2cae3d57e026c30f86ba85e531349fa5b1ef15960ddb7923c8a7308c92

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7005233b724d1f1cf09169d4f689693de6326ba6d68c5e997bc4217214be4706
MD5 b33d0fbee78fd25f4425d7cc74da78a4
BLAKE2b-256 8ce8c48ada767053e64baa0aec72fb0c8656ec8ce970d25470d5a9ed280ebb7a

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 207fa1e922b8d42dcd98e42e22504b142f84e42ec806cf9d717facfed31448bb
MD5 c0b4757ec41712433cc6fb4dc403ef39
BLAKE2b-256 9f9d58d8b261055dc639e8eab0417bd8d4b78965ea1e3b31826bd799c8c15e19

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 6a4e54574a0c8911b1b71e1fd3a13d8719861d1e7ac1ff2d4d1fd0dd4c404003
MD5 6e160a9d86e743bd9acc75314cd92760
BLAKE2b-256 f60d6c5b76074fc91346291a70fdac36e8c60ea13b25acf04167197e7290e33b

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 a1d9433077bfdccb06c954fbe48c1037b54ed14c5629dc46c6774ce94faf571b
MD5 a53040cbc24cd88353efe831d0cc7519
BLAKE2b-256 f5e0a18ba76430c2fab9e13800520ab75bf49c50a8b19c4efedb4e8d37ab07aa

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 62851b214777d839e9a924b0b07022366e6ea9db92c1194e6e06ea1f62034e22
MD5 d38ab547623addf104725abe3d11d2b6
BLAKE2b-256 f9476768dd3e7935aca2c7ddd048a7bcb4ceb64077497f5bb599046291dfba5e

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 dfd90e05568a3a504cd5ba0c89154878a2a09257b2d6c6ce8390b89706f70906
MD5 a47b67d25b785752d5a202392651c031
BLAKE2b-256 c909d8440bf821db2744f198786d1fbc245f421cdaf7af1d9fae7b60363ddc21

See more details on using hashes here.

File details

Details for the file ondine-1.10.1-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.1-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 82a1e808df12988f7b0010e89a14a6526d145088ef3f3fbc4e9f50337cd8cf16
MD5 e319226b3e8d3a921abe86d5f176a630
BLAKE2b-256 b80988922bb997b43227cf82ae449e705a83d1eac9edbb551d7af07472487415

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page