Skip to main content

Pydantic-native storage for time-series and evolving data models. SQLite + PostgreSQL, schema evolution without migrations, built-in Polars.

Project description

CentauroDB

CI PyPI version Python versions License: MIT

Docs: centaurodb.dev

Pydantic-native storage for time-series and evolving data models. SQLite + PostgreSQL backends, schema evolution without migrations, built-in Polars DataFrames.

pip install centaurodb

Optional extras:

pip install "centaurodb[polars]"     # for .df / sql_select() → DataFrame
pip install "centaurodb[postgres]"   # for PostgreSQL backend

Why CentauroDB?

CentauroDB is not another ORM. It targets a different problem:

  • You define data with Pydantic models, not table schemas.
  • Models are stored as JSON blobs, so adding a field never requires a migration — just give it a default value.
  • Time-series data is a first-class citizen via TimeSeriesCollection, with one-line conversion to a Polars DataFrame.
  • Same API on SQLite (zero-config, embedded) and PostgreSQL (production) — pick the URL, the dialect adapts.

Who is this for?

Built for data and analytics pipelines, IoT/sensor ingestion, quant backtesting, and analytics-shaped backends — workloads where schemas keep evolving, where the read path ends in a DataFrame, and where the write side is a process or a small fleet of workers, not a thousand concurrent web requests.

You want Use
Web app with relational schema, FK constraints, complex joins SQLAlchemy / SQLModel
High-concurrency web backend with auth / RLS / connection pooling Supabase + SQLAlchemy
Document store with full-text search and replication MongoDB
Pydantic models in, structured storage out, DataFrames back CentauroDB
Time-series + metadata with evolving schemas CentauroDB

Quickstart

from centaurodb import Engine, Collection, CentauroModel

class Book(CentauroModel):
    __centauro_name__ = "book"
    title: str = ""
    author: str = ""
    rating: float = 0.0

engine = Engine("library.db")          # or "sqlite://" for in-memory
                                       # or "postgresql://user:pw@host/db"
books = Collection(engine, "library")

# Write
dune = Book(title="Dune", author="Herbert", rating=4.8)
books.write_object(dune)

# Update
dune.rating = 5.0
books.update_object(dune)

# Query (JSON fields, AND/OR conditions)
top = books.read_objects(Book.fields.rating > 4.5)
for b in top:
    print(b.title, b.rating)

# Paginate large result sets
page = books.read_objects(Book.fields.rating > 4.5, limit=20, offset=40)
total = books.count_objects(Book.fields.rating > 4.5)

# Delete
books.delete_object(dune)

Time-series with Polars

from datetime import datetime
import polars as pl
from centaurodb import Engine, TimeSeriesCollection, CentauroModelSeries

class StockPrice(CentauroModelSeries):
    __centauro_name__ = "stock"
    ticker: str = ""
    exchange: str = "NYSE"

engine = Engine("stocks.db")
prices = TimeSeriesCollection(engine, "portfolio")

df = pl.DataFrame({
    "time":  [datetime(2026, 1, 1), datetime(2026, 1, 2)],
    "value": [185.20, 187.55],
})

apple = StockPrice(ticker="AAPL", values=df)
prices.write_object(apple)

# Read back, filtered by JSON metadata
[apple] = prices.read_objects(StockPrice.fields.ticker == "AAPL")

# .df is a Polars DataFrame with (time, value)
print(apple.df)

If your tool today is pandas.read_csv plus a folder of parquet files, this is the next step from that.


Schema evolution without migrations

Add a field — give it a default. Old rows still load fine:

class Book(CentauroModel):
    __centauro_name__ = "book"
    title: str = ""
    author: str = ""
    rating: float = 0.0
    pages: int | None = None      # NEW — old rows just see None

Rename a field — declare the old name as an alias:

from centaurodb import renamed_from

class Book(CentauroModel):
    __centauro_name__ = "book"
    page_count: int = renamed_from("pages", default=0)

Three guardrails enforced at class-definition time keep stored data readable across versions:

  1. All fields must have a default value.
  2. extra='forbid' is rejected (old keys must be silently dropped).
  3. __centauro_name__ is mandatory — it's the stable storage identifier.

Async

AsyncCollection and AsyncTimeSeriesCollection mirror the sync API and run under asyncio:

from centaurodb import Engine, AsyncCollection

async def main():
    engine = Engine("postgresql://localhost/mydb")
    books = AsyncCollection(engine, "library")
    results = await books.read_objects(Book.fields.rating > 4.5)

Status

  • 591 tests passing (610 collected; the 14 PostgreSQL integration tests skip without a CENTAURODB_PG_URL env var and run in CI against a real postgres:16 service container), ~4,000 LOC, fully type-hinted (PEP 561 py.typed).
  • v0.9.0 — beta. Stable storage format (five canonical columns — id, name, write_time, edit_time, meta — committed per ADR-0002, storage-format guarantees; future system features land in meta._sys.* or sibling tables, never as new columns) and stable public API on the surfaces marked stable.
  • See CHANGELOG.md for release history.

Production status & known limits

Honest about what's not in the box yet — so you can decide whether the gaps matter for your workload:

  • No connection pooling. A PostgresEngine holds a single connection. Fine for batch jobs, ETL, ingestion workers, and pipelines. Not yet sized for high-concurrency web backends.
  • AsyncCollection runs sync psycopg inside asyncio.to_thread. It works under FastAPI, but it does not unlock native-async concurrency. Treat it as ergonomic compatibility, not a perf win. A native psycopg.AsyncConnectionPool path is on the roadmap.
  • No automatic pagination. read_objects accepts limit / offset (and count_objects for totals); use them on any list endpoint that could grow unbounded.
  • No row-level security / built-in auth. This is a storage library, not a backend. Enforce auth at the application boundary.

If your use case is data pipelines, analytics, time-series, IoT, or quant research, none of these are blockers. If it's a high-concurrency multi-tenant web app, reach for SQLAlchemy + Supabase instead.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

centaurodb-0.9.0.tar.gz (89.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

centaurodb-0.9.0-py3-none-any.whl (46.7 kB view details)

Uploaded Python 3

File details

Details for the file centaurodb-0.9.0.tar.gz.

File metadata

  • Download URL: centaurodb-0.9.0.tar.gz
  • Upload date:
  • Size: 89.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for centaurodb-0.9.0.tar.gz
Algorithm Hash digest
SHA256 f723cc28ae1ace94dd59bd8305ca662f70659060578244c050b266ec7c4528fd
MD5 de676f9cfab855a391b80cac8a623609
BLAKE2b-256 79ba83fd95e7f3a3589b0239999b5faff91954a7998923d6c5089fcda1a5a65d

See more details on using hashes here.

Provenance

The following attestation bundles were made for centaurodb-0.9.0.tar.gz:

Publisher: publish.yml on aropele/CentauroDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file centaurodb-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: centaurodb-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 46.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for centaurodb-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c40d1d4c72ae185d1aef900f65ea987e67928291aa13999cee12ddc6a8e30645
MD5 73be8ec874ecb6323d70d18ecb2dccc0
BLAKE2b-256 eb913ee4088cc9d728a14ebda0489d7452f9629f5c371137f75405e53dcd4e50

See more details on using hashes here.

Provenance

The following attestation bundles were made for centaurodb-0.9.0-py3-none-any.whl:

Publisher: publish.yml on aropele/CentauroDB

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page