Skip to main content

Ondine - The LLM Dataset Engine. SDK for processing tabular datasets using LLMs with reliability, observability, and cost control

Project description

Ondine Logo

Ondine

A prompt is a column. A new DataFrame primitive for LLMs, with five dimensions of production support.

PyPI version Downloads License: MIT Python 3.10+ GitHub stars Tests

ondine.dev · Docs · PyPI

Ondine Demo

Ondine makes LLM calls a first-class DataFrame operation. Define a column with natural language. Ondine computes it at production scale.

from ondine import PipelineBuilder

df = (
    PipelineBuilder.create()
    .from_dataframe(df, input_columns=["review"], output_columns=["sentiment"])
    .with_prompt("Classify the tone of: {review}")
    .with_llm(provider="openai", model="gpt-5.4-mini")
    .build()
    .execute().data
)

The LLM stops being a service you call from your pipeline. It becomes a column function inside it.

Everything else in this README is how Ondine makes that primitive production-true across five dimensions: richer inputs (KB/RAG/OCR), constrained outputs (schemas, grounding), reliable execution (checkpoints, budget caps, adaptive concurrency), full observability, and any LLM backend.

Install

pip install ondine

Python 3.10+. Works with any LLM through LiteLLM: OpenAI, Anthropic, Groq, Mistral, Cerebras, Ollama, MLX, vLLM, SGLang, 100+ others.

30-second quickstart

from ondine import PipelineBuilder

pipeline = (
    PipelineBuilder.create()
    .from_csv("reviews.csv",
              input_columns=["review"],
              output_columns=["sentiment", "topic"])
    .with_prompt("Classify sentiment and extract the key topic from: {review}")
    .with_llm(provider="openai", model="gpt-5.4-mini")
    .with_max_budget(5.00)
    .build()
)

result = pipeline.execute()
print(f"Processed {result.metrics.processed_rows} rows · ${result.costs.total_cost:.2f}")

One builder chain: input columns, prompt, model, budget cap. Multi-column outputs get a JSON parser; schema enforcement, checkpointing, and cost tracking are on by default.

Prefer a one-liner? QuickPipeline.create(...) wraps the same builder with sensible defaults (see examples/).

The 5 dimensions

1. INPUTS: make the prompt richer

Feed the LLM more than raw column text. Pull context from documents, images, and prior runs.

  • Knowledge Base (RAG): ingest PDFs, Markdown, HTML, images via OCR. Hybrid BM25 + dense search with optional cross-encoder reranker. HyDE / multi-query / step-back query transforms.
  • OCR: three pluggable backends: multimodal Vision LLM, Tesseract (offline), DocTR.
  • Multi-column placeholders: use any number of input columns in one prompt ({col_a}, {col_b}).
  • Jinja2 templates + system prompts for richer prompt shaping.

2. OUTPUTS: constrain what comes back

Stop parsing strings. Get typed columns, validated against your schema, verified against your evidence.

  • Pydantic structured output: define a model, get typed columns back. Malformed JSON auto-retries up to 3x.
  • Multi-column parsing: one prompt → N typed columns.
  • Grounding verification (Context Store): each LLM answer checked against an evidence graph built from your dataset. Rust + SQLite + FTS5 backend. Contradictions flagged, not silently returned.

3. EXECUTION: run N rows reliably

Production plumbing that df.apply() doesn't give you.

  • Checkpointing to Parquet after every batch. Durable SQLite response cache for crash-atomic resume (A4, #144).
  • Hard budget caps: pre-run cost estimation, live tracking, halts the pipeline at your USD limit.
  • Multi-row batching: pack N rows per API call. 200 calls instead of 10,000 at batch_size=50.
  • Prefix caching: system prompt cached across batches. 40–50% token savings.
  • Adaptive concurrency: Netflix Gradient2 algorithm. Shrinks on 429, grows on saturation.
  • Retry-After parsing across 5 header shapes (OpenAI / Anthropic / Groq / RFC 7231 / ms-delta).
  • Distributed rate limiting via Redis (atomic Lua token bucket, cluster-aware).

4. OBSERVATION: see what happened

On by default. Integrates with the observability stack you already run.

  • ProgressBar + Logging + CostTracking observers active on every run.
  • Langfuse for LLM trace logging.
  • OpenTelemetry for distributed tracing.
  • Prometheus metrics export (request count, duration histogram, cost gauge).
  • Decimal precision for cost tracking (no floating-point surprises).

5. PROVIDERS: any LLM backend

  • 100+ providers via LiteLLM. Swap with a string.
  • Router with latency-based failover and automatic provider selection.
  • Local inference: Ollama, MLX (Apple Silicon), vLLM, SGLang.
  • Azure Managed Identity with 3 auth patterns (MI, API key, pre-fetched token).
  • Custom endpoints: any OpenAI-compatible API.

Beyond the quickstart

from ondine import PipelineBuilder
from ondine.knowledge import KnowledgeStore
from ondine.context import RustContextStore
from pydantic import BaseModel

class ReviewAnalysis(BaseModel):
    sentiment: str
    score: int
    topic: str

kb = KnowledgeStore("knowledge.db")
kb.ingest("docs/")   # PDFs, MD, HTML, images via OCR

pipeline = (
    PipelineBuilder.create()
    .from_csv("reviews.csv",
              input_columns=["review"],
              output_columns=["sentiment", "score", "topic"])
    .with_knowledge_base(kb, top_k=5, rerank=True, query_transform="hyde")
    .with_prompt("Context:\n{_kb_context}\n\nAnalyze: {review}")
    .with_llm(provider="openai", model="gpt-5.4-mini")
    .with_structured_output(ReviewAnalysis)
    .with_context_store(RustContextStore("evidence.db"))
    .with_grounding(threshold=0.3)
    .with_batch_size(50)
    .with_max_budget(25.00)
    .with_checkpoint_interval(100)
    .with_disk_cache(".cache")
    .with_router(strategy="latency")
    .with_observer("langfuse")
    .build()
)

result = pipeline.execute()

Every chained method maps to one of the five dimensions. See docs.ondine.dev for the full reference.

What "a prompt is a column" unlocks

Same primitive. The use case lives in the prompt.

Transform Prompt pattern
Classification "Classify {text} into one of {labels}"
Extraction "Extract name, date, amount from: {document}"
Scoring "Score {item} against {criteria} on 1–10"
Comparison "Is {a} equivalent to {b}? Return yes/no + reason."
Translation "Translate {text} from {src_lang} to {tgt_lang}"
Summarization "Summarize {document} in 3 bullets"

One abstraction. Any transform.

Compared to alternatives

Tool Primitive Why pick Ondine
Instructor f(prompt) → Pydantic (one call) Ondine applies that pattern to N rows, with the 5 dimensions
Pandas-AI df.chat("question") Different primitive (query vs. compute)
LangChain batch chain.batch([...]) No budget cap, no grounding, no observability defaults
OpenAI/Anthropic Batch API Provider-specific batch No multi-provider, no grounding, no crash-safety, 24-hour turnaround
Airflow/Prefect/Dagster Workflow orchestrators Heavy setup, no LLM-specific features. Ondine ships integrations for them.
Ondine Prompt(columns) → new_columns A primitive, not a wrapper

Local inference

from ondine import QuickPipeline

# Ollama
pipeline = QuickPipeline.create(
    data="reviews.csv",
    prompt="Classify sentiment: {review}",
    output_columns=["sentiment"],
    model="ollama/qwen3.5",
)

# MLX (Apple Silicon, native; no server process)
pipeline = QuickPipeline.create(
    data="reviews.csv",
    prompt="Classify sentiment: {review}",
    output_columns=["sentiment"],
    model="mlx/mlx-community/Llama-4-Scout-Instruct-4bit",
)

No API keys. No telemetry. Fully offline.

Documentation

Contributing

PRs welcome. See CONTRIBUTING.md. Code style: Black + Ruff. Tests required for new features.

License

MIT. See LICENSE.

Acknowledgments

  • LiteLLM: provider routing layer
  • Instructor: the single-call pattern Ondine applies at DataFrame scale
  • The Pydantic team: validation backbone

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ondine-1.10.0rc1-cp313-cp313-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.13Windows x86-64

ondine-1.10.0rc1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

ondine-1.10.0rc1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

ondine-1.10.0rc1-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

ondine-1.10.0rc1-cp312-cp312-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.12Windows x86-64

ondine-1.10.0rc1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

ondine-1.10.0rc1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

ondine-1.10.0rc1-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

ondine-1.10.0rc1-cp311-cp311-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.11Windows x86-64

ondine-1.10.0rc1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

ondine-1.10.0rc1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

ondine-1.10.0rc1-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

ondine-1.10.0rc1-cp310-cp310-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.10Windows x86-64

ondine-1.10.0rc1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

ondine-1.10.0rc1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

ondine-1.10.0rc1-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.10macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file ondine-1.10.0rc1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.0rc1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.0rc1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 593e069622c93e7ce0d1ba17309f360a7850c10d1b9c0d7587f68b7fe943ccff
MD5 f3676ef8879e3718cbd51af1e7e36ef4
BLAKE2b-256 0a4bb90ad8d8ed43e0fb33763c5ed6d4ac898da1789d4ee67e30a16a6abf4a6a

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e5f61efdebb6c01a92a589f4f6efd028555b34153d4470fb2e9894692aa25416
MD5 f71abd484fcae38e816c30cd413fbea9
BLAKE2b-256 186f0bb7a3cfc104b2f2dbf38f4281cf1ef0e82a3d7abcee63050f5d158580d2

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5d53d47922e477eeb4cbaa94d1d17ac8a8ac74907b02ec5786eb79ba9d885f74
MD5 5206ba42b76e33ff85de8545b7ef5735
BLAKE2b-256 ef1f256cf7babd02fedf3574136c3ed8ac63652004f28feef49b80d5712f15a1

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 f88adc901288bbaabf633b4d827eac97791367df56a935332c346ddd19f0ef15
MD5 0162bd6e45ea9469890cc3507a61d760
BLAKE2b-256 8f2592125d15254398d80a14441e548619a80ce0fa731d4150d2fac7fc2c23b9

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.0rc1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.0rc1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 7d9e4d16744a60ef7fbff65a17d20b9983074b1b00b370884ad26078b9656b4d
MD5 4167c87191f38e9e0399f6680620e822
BLAKE2b-256 8db00ec21c23eccbed7e8ce4545739bf4eb20de4f316967a06149853655014f8

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 00c441714394355847de7faec3c6de3edd0b06a1c3beb1fe8cc8d9906d2d66bc
MD5 5b24a2a149f60df42141f6706487486e
BLAKE2b-256 9e936557efce6a02ce85155ad94c32204a493e181137df699526344930ee546d

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a5eb21a94faecc27cae87aa3966121c65d71349c490a027699435227a767d9a6
MD5 b11d54d6b1fe761c340998012165ead9
BLAKE2b-256 31753f769deea1bb8e712ef8cc95275557994555902d7476c5644e2c85ddec28

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 582730ff2ab6700efa9c4a692f9ba0bfd66cba1f46423cd0b80c35f77820f98a
MD5 8f7eed0be1d9e572d10bba93c40d0139
BLAKE2b-256 aed8746c9607a48d67a2ab7463ce1d8e617110af672908f826500246b6d2254d

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.0rc1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.0rc1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 886d93d24b66df9a5ef21b0efe48eb39a9f1d09089c05b9944b4716c33fcaf5b
MD5 b129a7f1b4490ebb492e6ea7b6c2d52f
BLAKE2b-256 bebabfaa160a8da3ae04be08af369454c1e250a33b32e018b43703fb94d7511c

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8c23a6a4f2e296eb3108ddddfff9d02fdade64e8b06786b2111930b8dcca25d2
MD5 2d52a28c8a8e185537899ac3afb8f5c1
BLAKE2b-256 d9bee8c3df1565dffad37127bfecb8618efcb753826f6f6ca6fea168a9b6b5cf

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 893864a6af2fa15bfe05c225c935df8cf349106e63d67e40a97dac31bd86c135
MD5 e036619ed16360fb40d58c404005057c
BLAKE2b-256 cc48f2d1af120f9e28e09597e312155f98a17c18fe9889a3ca4895617fdff25e

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 61de721fdfd75c56df7bea7b6f2578eec93bb41817edb8bf7eba2470c956fd1d
MD5 dff7994ee1f4a004bdeb81cca7067367
BLAKE2b-256 304d1f4d16911a3007a54bb10ed211830d225ba32aa3ab96fda1d33a1aabd711

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.0rc1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.0rc1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 db7a1466e23eab2450672dab98c4c916656259aa52a51d63ec0dc36bac9a178d
MD5 a95484d9ab21b0ed1cbc9d7be901fb1a
BLAKE2b-256 b0272d942031262e5cbffac3e48a5c4cddf14920d0864da638d0e67d848d6c29

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f40365346da16c614503bf80301d15c45008620fb47537584b7cd515e59c55e4
MD5 602531b595402da0fe051f23b3ac9947
BLAKE2b-256 a158be1a5d9c9f15cfa2b0895c50d1426524f381c195266cb5a78b3ee183d0a2

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 31a95faac55396a51ca115b4d7cc6606f5c17f6b8dc26ff5709d18735ccf3056
MD5 baf836ca66e1bedf0fd0ea74cd0730ae
BLAKE2b-256 bb75fdfb8205beb5b9278254593ed8d55c2d1a4b6dc063baa5859801658fc3a2

See more details on using hashes here.

File details

Details for the file ondine-1.10.0rc1-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.0rc1-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 ac1b6003ec186171fa4a074e8e139e7333f1ce0b0900b2f5faf8017a90442a4f
MD5 2fa0c63defe07d909c989410b1d01f0e
BLAKE2b-256 84ae348ecd7500f5892954b3ba799129ce597cb0a49acd2c011ceeb3f8f1213c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page