Skip to main content

A library for explicit data pipelines

Project description

tacit

Pydantic-style schemas for DataFrame pipelines, built on ibis and pandera.

Every DataFrame operation makes implicit assumptions about the data — which columns exist, their types, whether nulls are allowed. Tacit makes them explicit: you define schemas as Python classes and enforce contracts on the functions that transform them. From that single definition:

  • Catch errors where they happen — pandera validates actual data at pipeline boundaries. Missing columns, wrong types, constraint violations — caught where bad data enters, not three stages downstream.
  • Catch errors before they happen — type checkers (mypy, pyright, ty, pyrefly) verify that every pipeline stage respects the contract before your code runs.
  • Make contracts self-documenting — "go to definition" on any schema shows every column, its type, and its constraints. No Slack threads, no stale wiki pages. The code has the full context — for teammates, for your future self, and for coding agents that can discover schemas without extra context files.
  • Make changes safe — rename a column in a schema and your type checker flags every function that needs updating — across teams, across repos.

Works across any ibis-supported backend — DuckDB, Spark, BigQuery, Snowflake, Polars, Postgres, and more.

Documentation

Install

uv add tacit

# or with pip directly
pip install tacit

Quick example

import ibis
import tacit


class Iris(tacit.Schema):
    sepal_length: float
    sepal_width: float
    petal_length: float
    petal_width: float
    species: str


class IrisFeatures(Iris):
    sepal_ratio: float
    petal_ratio: float
    petal_area: float


@tacit.contract
def engineer_features(df: tacit.DataFrame[Iris]) -> tacit.DataFrame[IrisFeatures]:
    return df.mutate(
        sepal_ratio=df.sepal_length / df.sepal_width,
        petal_ratio=df.petal_length / df.petal_width,
        petal_area=df.petal_length * df.petal_width,
    )


con = ibis.duckdb.connect()
raw = con.read_csv("iris.csv")

iris = Iris.parse(raw)
features = engineer_features(iris)

Schemas are Python classes — your editor autocompletes column names from them. parse() coerces types and validates at the boundary. @contract enforces input/output schemas at runtime. DataFrame[S] is an ibis Table, so you get the full ibis expression API with no wrapping.

What else

Constraints — go beyond column names and types with value-level checks, powered by pandera:

from typing import Annotated

class Order(tacit.Schema):
    amount: Annotated[float, tacit.Check.ge(0)]
    status: Annotated[str, tacit.Check.isin(["pending", "shipped"])]
    notes: Annotated[str, tacit.Nullable()]

cast() vs parse()parse() runs full validation (executes queries). cast() checks column names and types only — zero execution cost, for internal pipeline steps where the data has already been validated.

validate=True@contract uses cast() by default. Pass validate=True at pipeline entry points to run full parse() validation on inputs and outputs.

See the documentation for the full guide, API reference, and examples.

FAQ

Does this work with pandas? — Tacit builds on ibis, which moved away from pandas as a backend. If your data currently lives in pandas DataFrames, you can use a well-supported engine like DuckDB or Polars as the execution backend — ibis reads from and converts back to pandas seamlessly, while giving you a modern query engine underneath.

Which backends are supported? — Any engine that ibis supports. Tacit delegates all query execution to ibis, so backend support is inherited automatically. See the ibis backends page for the full list.

Which checks and constraints are available? — Tacit delegates constraint validation to pandera's ibis backend. Anything in pandera's Check API that has ibis support will work. See pandera's ibis compatibility status for what's currently available.

Status

Early development. The API is not stable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tacit-0.2.0.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tacit-0.2.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file tacit-0.2.0.tar.gz.

File metadata

  • Download URL: tacit-0.2.0.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for tacit-0.2.0.tar.gz
Algorithm Hash digest
SHA256 83d0b102274e79e9134f3932e11b28817e155a751972a95e0375fd4902d0fe1f
MD5 d02e2b29370f4301faf890189c329e47
BLAKE2b-256 45a38124f5b8d5fc6ca14164f98461c8903ac55a29917fcdfd0f4d38ae765cae

See more details on using hashes here.

File details

Details for the file tacit-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: tacit-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for tacit-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 12e744bb182158a8125147cc1b11b15aa2f8416f9913fe3cef6baa7ad19d632c
MD5 3887547f16c95c8aed54e83e0c56daf6
BLAKE2b-256 195fc0f996bb20ccb7b316e5a3ba2c4ed0262d78bb3aa9283c80accc7253ed35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page