Skip to main content

Ondine - The LLM Dataset Engine. SDK for processing tabular datasets using LLMs with reliability, observability, and cost control

Project description

Ondine Logo

Ondine

A prompt is a column. A new DataFrame primitive for LLMs, with five dimensions of production support.

PyPI version Downloads License: MIT Python 3.10+ GitHub stars Tests

ondine.dev · Docs · PyPI

Ondine Demo

Ondine makes LLM calls a first-class DataFrame operation. Define a column with natural language. Ondine computes it at production scale.

from ondine import PipelineBuilder

df = (
    PipelineBuilder.create()
    .from_dataframe(df, input_columns=["review"], output_columns=["sentiment"])
    .with_prompt("Classify the tone of: {review}")
    .with_llm(provider="openai", model="gpt-5.4-mini")
    .build()
    .execute().data
)

The LLM stops being a service you call from your pipeline. It becomes a column function inside it.

Everything else in this README is how Ondine makes that primitive production-true across five dimensions: richer inputs (KB/RAG/OCR), constrained outputs (schemas, grounding), reliable execution (checkpoints, budget caps, adaptive concurrency), full observability, and any LLM backend.

Install

pip install ondine

Python 3.10+. Works with any LLM through LiteLLM: OpenAI, Anthropic, Groq, Mistral, Cerebras, Ollama, MLX, vLLM, SGLang, 100+ others.

30-second quickstart

from ondine import PipelineBuilder

pipeline = (
    PipelineBuilder.create()
    .from_csv("reviews.csv",
              input_columns=["review"],
              output_columns=["sentiment", "topic"])
    .with_prompt("Classify sentiment and extract the key topic from: {review}")
    .with_llm(provider="openai", model="gpt-5.4-mini")
    .with_max_budget(5.00)
    .build()
)

result = pipeline.execute()
print(f"Processed {result.metrics.processed_rows} rows · ${result.costs.total_cost:.2f}")

One builder chain: input columns, prompt, model, budget cap. Multi-column outputs get a JSON parser; schema enforcement, checkpointing, and cost tracking are on by default.

Prefer a one-liner? QuickPipeline.create(...) wraps the same builder with sensible defaults (see examples/).

The 5 dimensions

1. INPUTS: make the prompt richer

Feed the LLM more than raw column text. Pull context from documents, images, and prior runs.

  • Knowledge Base (RAG): ingest PDFs, Markdown, HTML, images via OCR. Hybrid BM25 + dense search with optional cross-encoder reranker. HyDE / multi-query / step-back query transforms.
  • OCR: three pluggable backends: multimodal Vision LLM, Tesseract (offline), DocTR.
  • Multi-column placeholders: use any number of input columns in one prompt ({col_a}, {col_b}).
  • Jinja2 templates + system prompts for richer prompt shaping.

2. OUTPUTS: constrain what comes back

Stop parsing strings. Get typed columns, validated against your schema, verified against your evidence.

  • Pydantic structured output: define a model, get typed columns back. Malformed JSON auto-retries up to 3x.
  • Multi-column parsing: one prompt → N typed columns.
  • Grounding verification (Context Store): each LLM answer checked against an evidence graph built from your dataset. Rust + SQLite + FTS5 backend. Contradictions flagged, not silently returned.

3. EXECUTION: run N rows reliably

Production plumbing that df.apply() doesn't give you.

  • Checkpointing to Parquet after every batch. Durable SQLite response cache for crash-atomic resume (A4, #144).
  • Hard budget caps: pre-run cost estimation, live tracking, halts the pipeline at your USD limit.
  • Multi-row batching: pack N rows per API call. 200 calls instead of 10,000 at batch_size=50.
  • Prefix caching: system prompt cached across batches. 40–50% token savings.
  • Adaptive concurrency: Netflix Gradient2 algorithm. Shrinks on 429, grows on saturation.
  • Retry-After parsing across 5 header shapes (OpenAI / Anthropic / Groq / RFC 7231 / ms-delta).
  • Distributed rate limiting via Redis (atomic Lua token bucket, cluster-aware).

4. OBSERVATION: see what happened

On by default. Integrates with the observability stack you already run.

  • ProgressBar + Logging + CostTracking observers active on every run.
  • Langfuse for LLM trace logging.
  • OpenTelemetry for distributed tracing.
  • Prometheus metrics export (request count, duration histogram, cost gauge).
  • Decimal precision for cost tracking (no floating-point surprises).

5. PROVIDERS: any LLM backend

  • 100+ providers via LiteLLM. Swap with a string.
  • Router with latency-based failover and automatic provider selection.
  • Local inference: Ollama, MLX (Apple Silicon), vLLM, SGLang.
  • Azure Managed Identity with 3 auth patterns (MI, API key, pre-fetched token).
  • Custom endpoints: any OpenAI-compatible API.

Beyond the quickstart

from ondine import PipelineBuilder
from ondine.knowledge import KnowledgeStore
from ondine.context import RustContextStore
from pydantic import BaseModel

class ReviewAnalysis(BaseModel):
    sentiment: str
    score: int
    topic: str

kb = KnowledgeStore("knowledge.db")
kb.ingest("docs/")   # PDFs, MD, HTML, images via OCR

pipeline = (
    PipelineBuilder.create()
    .from_csv("reviews.csv",
              input_columns=["review"],
              output_columns=["sentiment", "score", "topic"])
    .with_knowledge_base(kb, top_k=5, rerank=True, query_transform="hyde")
    .with_prompt("Context:\n{_kb_context}\n\nAnalyze: {review}")
    .with_llm(provider="openai", model="gpt-5.4-mini")
    .with_structured_output(ReviewAnalysis)
    .with_context_store(RustContextStore("evidence.db"))
    .with_grounding(threshold=0.3)
    .with_batch_size(50)
    .with_max_budget(25.00)
    .with_checkpoint_interval(100)
    .with_disk_cache(".cache")
    .with_router(strategy="latency")
    .with_observer("langfuse")
    .build()
)

result = pipeline.execute()

Every chained method maps to one of the five dimensions. See docs.ondine.dev for the full reference.

What "a prompt is a column" unlocks

Same primitive. The use case lives in the prompt.

Transform Prompt pattern
Classification "Classify {text} into one of {labels}"
Extraction "Extract name, date, amount from: {document}"
Scoring "Score {item} against {criteria} on 1–10"
Comparison "Is {a} equivalent to {b}? Return yes/no + reason."
Translation "Translate {text} from {src_lang} to {tgt_lang}"
Summarization "Summarize {document} in 3 bullets"

One abstraction. Any transform.

Compared to alternatives

Tool Primitive Why pick Ondine
Instructor f(prompt) → Pydantic (one call) Ondine applies that pattern to N rows, with the 5 dimensions
Pandas-AI df.chat("question") Different primitive (query vs. compute)
LangChain batch chain.batch([...]) No budget cap, no grounding, no observability defaults
OpenAI/Anthropic Batch API Provider-specific batch No multi-provider, no grounding, no crash-safety, 24-hour turnaround
Airflow/Prefect/Dagster Workflow orchestrators Heavy setup, no LLM-specific features. Ondine ships integrations for them.
Ondine Prompt(columns) → new_columns A primitive, not a wrapper

Local inference

from ondine import QuickPipeline

# Ollama
pipeline = QuickPipeline.create(
    data="reviews.csv",
    prompt="Classify sentiment: {review}",
    output_columns=["sentiment"],
    model="ollama/qwen3.5",
)

# MLX (Apple Silicon, native; no server process)
pipeline = QuickPipeline.create(
    data="reviews.csv",
    prompt="Classify sentiment: {review}",
    output_columns=["sentiment"],
    model="mlx/mlx-community/Llama-4-Scout-Instruct-4bit",
)

No API keys. No telemetry. Fully offline.

Documentation

Contributing

PRs welcome. See CONTRIBUTING.md. Code style: Black + Ruff. Tests required for new features.

License

MIT. See LICENSE.

Acknowledgments

  • LiteLLM: provider routing layer
  • Instructor: the single-call pattern Ondine applies at DataFrame scale
  • The Pydantic team: validation backbone

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ondine-1.10.0.tar.gz (255.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ondine-1.10.0-cp313-cp313-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.13Windows x86-64

ondine-1.10.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

ondine-1.10.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

ondine-1.10.0-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

ondine-1.10.0-cp312-cp312-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.12Windows x86-64

ondine-1.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

ondine-1.10.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

ondine-1.10.0-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

ondine-1.10.0-cp311-cp311-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.11Windows x86-64

ondine-1.10.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

ondine-1.10.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

ondine-1.10.0-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

ondine-1.10.0-cp310-cp310-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.10Windows x86-64

ondine-1.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

ondine-1.10.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

ondine-1.10.0-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.0 MB view details)

Uploaded CPython 3.10macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file ondine-1.10.0.tar.gz.

File metadata

  • Download URL: ondine-1.10.0.tar.gz
  • Upload date:
  • Size: 255.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.0.tar.gz
Algorithm Hash digest
SHA256 31e58a36d35af71c0576ab7869cd682f1403361ffd75a040b2c363b4aa740a43
MD5 33f5b4700cb7279a07f918800637077c
BLAKE2b-256 30ee82cc86d8830e0bd329d6d94eb6d4711d3e3da0ddac7f8770d993323349d5

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 ae8ca1cc73bba06cd06b8ac50476df05f4cafa5119958e19b7a42f971d1b5a03
MD5 515edc970ba189bc298d68db6d3765c7
BLAKE2b-256 4ad9ef7b1c4e3fd71f92eebc87c8dbc02baf050a87b2c5a43b844d7320a4d254

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 df2ca8a19211ee2b9d77e9385d7c6852e6c3062de79f3ae2d1690c5d01869d0e
MD5 7e859fd7d8b60dd7c3085cdcde8c7bcf
BLAKE2b-256 adfb511cd20a3db3bd5e874d6f0b1fdda8767d150a5aa4fdc28b0fabda001d44

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4b59a49870ac9e0856b53cd00154e0ef19bfa6a92db620e50518a1c2a62015ba
MD5 33b4a0d26426def33e352aec4031593c
BLAKE2b-256 d69ec67a228c2ed3c45251ca25ab856ca50a4666e38cb9a140d171efab490608

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 d090e960660255e861d4214c25a6e62dd2c46e24377b07bbb11a7b78781afa0b
MD5 6034e10534ef01e76c6e6b9afa5105d8
BLAKE2b-256 1d578e537f5642754a4b6c6046d5ccb6e4c8306ac99228a84e1ce2fde8238934

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 9c00e1e33c9d9a4c6d6d604dcdbe9cf040a04294c8ff5cc5a15dec67e5dc06d1
MD5 5f8f7e8d0861652e2224652c88b3cc7f
BLAKE2b-256 1af70ef92d14ec8f5c6a6f822651cfa333d8aa360697c36ffc9e57ab1b696d9f

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e4b933e622e5c9ed9177b3c0f9e154fc5458f84b401bb8ca8e6014c2457e0a8a
MD5 5269bb14c8580b3e1a72402d4c24f733
BLAKE2b-256 34cf2a766b95f03648c1be6d236924cc1f6b747e1be696128bb0eae4beeb940e

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5576d92d9846eab59194c90b1d45d99303854cb193e0a9f66a2721d6f4aa129c
MD5 63c42be8e166cfd820137919373d7915
BLAKE2b-256 c137e200d63dfa43e580e7fc512211f03c5841ee5946d4ed0654c014c2264117

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 613f2dc9dfdf0c8f221c42110076d66e13c787f4557b7fa8a2f1d09be9ee132b
MD5 5f48db60706caaa883a88b4a18efc314
BLAKE2b-256 acc459f01d786cf4146ea882968228bf6445fc1b8d0a847804fc7613c9d61b18

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 240c4ae92b052e10710abf82c9d275c21ace903a342415c98db6ea7976e1d63e
MD5 bedd27d9625950f1a7ecbf18519e4ed7
BLAKE2b-256 d690af71ff8f7c1423040a5ad02f097396b9ac4365255e8e6c196effa5950efd

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f30cb1476ae9698a34b66995fe1ba95d2552a0eff48f5f6f07220c0fb46db90d
MD5 60c23ce35bafd594cd1807355b959b3f
BLAKE2b-256 d34378c73afe6ba30f04c1949d08dd271be746cb9fb3526d7e74501d42fb077d

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 37b1645c7bf7dc43ecdc361edcd06f232cf04a39797fa7df1766566abbffba79
MD5 ae1d1889a2e393d77dab7004a48e98c6
BLAKE2b-256 9d02aecd259e05fe7944cc9053462de6f7f1ffdcbb80d388aeb691a314d0fba1

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 783e2a789622f9e547d5ae99eef918cb2c02be62870f4704682c2fbebc390c54
MD5 21cece62648d599d1c4a901335b52e1b
BLAKE2b-256 4a579501bda5e20f1c2c89d2359d00fe2cfd1a73e96daed9ed208d8ce0805c74

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: ondine-1.10.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ondine-1.10.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 3b99c1184d217a066ca1cd8360ae385fec555d40b54be5dc2b2df0adf102b572
MD5 b25de2fe1531ed31c84c57165771f2fb
BLAKE2b-256 0c543103b782f8ecc61e68054b54fc9afe21cb1213d1c8b1782eb78931008b9f

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 623b02e300f2922e2f0a5030ea0ccf87e633901710f0074742cb31e97ed2a689
MD5 fd5aa8039d09d6fe5228f0dcee21bba8
BLAKE2b-256 c562e5fbff2f45b7cba0cd77dd8b1386e0f3630c3630a5a9d74b8b7339ceaf88

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d7ceb9ea5ce79d47c7b0e274995131713ba38940f77ca3fa866f16acba3dc05e
MD5 38e0d34589de243fe085801985048633
BLAKE2b-256 110406791a47b33a43edcb150dcf85021a637431bb7e22ffc0965a8d593d53fa

See more details on using hashes here.

File details

Details for the file ondine-1.10.0-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for ondine-1.10.0-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 ee817867a143357caf5cab0fc54fb71fc18185508b09a6ed16f3adeaf9ac7512
MD5 ffc05e46fc82a06944df331aa59e2972
BLAKE2b-256 59d885e61251940a3ecd159416aba3516870aff5efd8176495fd293d1261eecf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page