A library for explicit data pipelines
Project description
tacit
Pydantic-style schemas for DataFrame pipelines, built on ibis and pandera.
Every DataFrame operation makes implicit assumptions about the data — which columns exist, their types, whether nulls are allowed. Tacit makes them explicit: you define schemas as Python classes and enforce contracts on the functions that transform them. From that single definition:
- Catch errors where they happen — pandera validates actual data at pipeline boundaries. Missing columns, wrong types, constraint violations — caught where bad data enters, not three stages downstream.
- Catch errors before they happen — type checkers (mypy, pyright, ty, pyrefly) verify that every pipeline stage respects the contract before your code runs.
- Make contracts self-documenting — "go to definition" on any schema shows every column, its type, and its constraints. No Slack threads, no stale wiki pages. The code has the full context — for teammates, for your future self, and for coding agents that can discover schemas without extra context files.
- Make changes safe — rename a column in a schema and your type checker flags every function that needs updating — across teams, across repos.
Works across any ibis-supported backend — DuckDB, Spark, BigQuery, Snowflake, Polars, Postgres, and more.
Install
uv add tacit
# or with pip directly
pip install tacit
Quick example
import ibis
import tacit
class Iris(tacit.Schema):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
species: str
class IrisFeatures(Iris):
sepal_ratio: float
petal_ratio: float
petal_area: float
@tacit.contract
def engineer_features(df: tacit.DataFrame[Iris]) -> tacit.DataFrame[IrisFeatures]:
return df.mutate(
sepal_ratio=df.sepal_length / df.sepal_width,
petal_ratio=df.petal_length / df.petal_width,
petal_area=df.petal_length * df.petal_width,
)
con = ibis.duckdb.connect()
raw = con.read_csv("iris.csv")
iris = Iris.parse(raw)
features = engineer_features(iris)
Schemas are Python classes — your editor autocompletes column names from them.
parse() coerces types and validates at the boundary. @contract enforces
input/output schemas at runtime. DataFrame[S] is an ibis Table, so you get
the full ibis expression API with no wrapping.
What else
Constraints — go beyond column names and types with value-level checks, powered by pandera:
from typing import Annotated
class Order(tacit.Schema):
amount: Annotated[float, tacit.Check.ge(0)]
status: Annotated[str, tacit.Check.isin(["pending", "shipped"])]
notes: Annotated[str, tacit.Nullable()]
cast() vs parse() — parse() runs full validation (executes queries).
cast() checks column names and types only — zero execution cost, for internal
pipeline steps where the data has already been validated.
validate=True — @contract uses cast() by default. Pass
validate=True at pipeline entry points to run full parse() validation on
inputs and outputs.
See the documentation for the full guide, API reference, and examples.
FAQ
Does this work with pandas? — Tacit builds on ibis, which moved away from pandas as a backend. If your data currently lives in pandas DataFrames, you can use a well-supported engine like DuckDB or Polars as the execution backend — ibis reads from and converts back to pandas seamlessly, while giving you a modern query engine underneath.
Which backends are supported? — Any engine that ibis supports. Tacit delegates all query execution to ibis, so backend support is inherited automatically. See the ibis backends page for the full list.
Which checks and constraints are available? — Tacit delegates constraint validation to pandera's ibis backend. Anything in pandera's Check API that has ibis support will work. See pandera's ibis compatibility status for what's currently available.
Status
Early development. The API is not stable.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tacit-0.3.0.tar.gz.
File metadata
- Download URL: tacit-0.3.0.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e8dbee7616dcef815ab8b846a88cab9eeab265ff25180be0c1fa8c9096806c3
|
|
| MD5 |
a09f6aa3b305525bb776e82db6fbbc5f
|
|
| BLAKE2b-256 |
836aebf670908019c6c53ece4924a78327c9370c3ede4eb4fdfbb503563fa7b8
|
File details
Details for the file tacit-0.3.0-py3-none-any.whl.
File metadata
- Download URL: tacit-0.3.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fd2f84c52b5f324d41c36c5dedb5c1f70a7a689c6597aaac5a7f02bdee023e0
|
|
| MD5 |
4dc8ef51b20d094c4ad4f9f6dfb1fac2
|
|
| BLAKE2b-256 |
fc0da7980caeb8233f9aa76378d9f4a7268f96a21090e751f8206e4801cf8595
|