Skip to main content

Python binding for the obj embedded document database.

Project description

obj-db (Python)

The embedded document database for Python. Dependable. Portable. Zero-infrastructure.

PyPI

Part of obj — a self-contained, serverless, single-file document database with a stable file format and full ACID semantics. The wheel is at parity with the Rust surface and writes a byte-identical file format.

Wheel obj-db on PyPI; import as obj. Built with PyO3 (abi3-py39).

pip install obj-db

Quickstart

Wrap a @dataclass with @obj.document for the typed, ergonomic API. The codec produces postcard bytes byte-identical to Rust's #[derive(Document)] for the same schema.

from dataclasses import dataclass
import obj

@obj.document(collection="orders", version=1)
@dataclass
class Order:
    customer_id: int
    total: float
    status: str

with obj.Db("app.obj") as db:
    doc_id = db.insert(Order(customer_id=1, total=99.5, status="pending"))
    order = db.get(Order, doc_id)
    for (oid, o) in db.all(Order):
        ...

Three write surfaces

The same CRUD methods (insert / get / update / upsert / delete / all) dispatch by argument type:

  • Typed documents — pass an @obj.document instance or class. Routes through the schema-driven codec; on-disk bytes match Rust.
  • Dict-native — pass a collection str plus a dict for ad-hoc writes with no @document boilerplate: db.insert("events", {...}).
  • Raw bytes — pass a collection str plus bytes. obj does not serialise for you on this path; encode however you like (json, msgpack, postcard, pickle). Mirrors the obj C ABI contract.

db.update(Cls, id, fn) is an atomic read-modify-write inside one transaction (no lost-update window); a raising fn rolls it back. Per-document lazy migration mirrors Rust's Migrate trait via a history=[...] arg and a cls.migrate(doc, from_version) classmethod.

obj's encoding is positional (schema-driven, no field names in the bytes) and the schema registry is process-global per (collection, version). Two handles declaring the same collection with a different shape raise obj.InvalidArgumentError rather than risk a silent mis-encode — use distinct names or bump version=.


Transactions

WriteTxn batches many typed writes into a single commit / single WAL fsync, and reads see the transaction's own uncommitted writes:

with obj.Db("app.obj") as db:
    with db.transaction() as tx:
        for i in range(1000):
            tx.insert(Order(customer_id=i, total=float(i), status="new"))
        tx.update(Order, 1, lambda o: setattr(o, "status", "shipped"))
        tx.insert("audit_log", b"<raw bytes>")   # raw overload still works
        # one commit + one fsync on __exit__

tx.collection(Cls) binds a class once and exposes typed CRUD scoped to the transaction.


Secondary indexes

Declare indexes with typing.Annotated markers — obj.Index, obj.Unique, obj.Each (multi-value, on a list[...] field) — plus an indexes=[obj.Composite((...), name=...)] decorator arg. This mirrors Rust's #[obj(index ...)] attributes; the index B-trees are maintained on every write.

from typing import Annotated

@obj.document(
    collection="orders", version=1,
    indexes=[obj.Composite(("region", "status"), name="by_region_status")],
)
@dataclass
class Order:
    email:  Annotated[str, obj.Unique]      # unique index
    region: Annotated[str, obj.Index]       # standard index
    tags:   Annotated[list[str], obj.Each]  # multi-value index
    status: str = "new"

with obj.Db("app.obj") as db:
    db.insert(Order(email="a@b.com", region="us", tags=["vip"]))
    order = db.find_unique(Order, "email", "a@b.com")          # exact lookup
    for oid, o in db.index_range(Order, "region", "us", "us"): # half-open range
        ...

A duplicate Unique key raises obj.InvalidArgumentError and rolls the write back atomically.


Querying

db.query(...) returns a lazy, immutable builder (each call returns a fresh Query). Pass an @obj.document class for typed results or a collection str for dict-native results:

top = (db.query(Order)
         .filter(lambda o: o.status == "shipped")   # AND-combined predicates
         .sort_by(lambda o: o.total)
         .limit(10)
         .fetch())                                   # -> list[tuple[int, Order]]

count = db.query(Order).filter(lambda o: o.total > 100).count()  # no-decode fast path
us    = db.query(Order).index_range("region", "us", "us").fetch()

The sort buffer is bounded (obj.MAX_SORT_BUFFER, overridable per query with .sort_buffer_limit(n)); an over-cap sort raises rather than allocating without limit.


Multi-file attach

Open another .obj file's collections read-only under a namespace, addressed as "namespace.collection":

with obj.Db("app.obj") as db:
    db.attach("archive.obj", "archive")
    archived = list(db.all("archive.orders"))   # also db.get / db.query
    db.detach("archive")

Writes to a namespaced collection raise obj.InvalidArgumentError. A namespaced read needs its schema registered under the namespaced name (declare a class with collection="archive.orders", or read raw bytes); an unregistered read fails loud rather than returning garbage.


Async

obj.AsyncDb mirrors the blocking Db for asyncio callers — a thin wrapper that offloads each call to a thread executor (the GIL is released around engine work; no new runtime dependency).

import asyncio, obj

async def main():
    adb = await obj.AsyncDb.open("app.obj")
    oid = await adb.insert(Order(email="e@f.com", region="us", tags=[]))
    async for oid, o in adb.all(Order):
        ...
    async with adb.transaction() as tx:      # commits on clean exit
        await tx.insert(Order(email="g@h.com", region="eu", tags=[]))
    await adb.close()

asyncio.run(main())

Each async transaction pins one worker thread for its lifetime (the txn handle is not Send), so async with adb.transaction() is safe to drive op-by-op.


Checkpointing

Writes land in a write-ahead log (<db>.obj-wal) first; the main file stays sparse until a checkpoint folds committed WAL pages into it. A checkpoint fires automatically at ~1000 WAL frames, and a clean close() (including a with block that exits without raising) folds the WAL for you — so the .obj file is self-contained after normal shutdown.

with obj.Db("app.obj") as db:
    for note in notes:
        db.insert(note)
    db.checkpoint()   # fold on demand (optional)

checkpoint() is a no-op when there is nothing to fold, and is deferred if a concurrent reader has pinned a snapshot below the WAL end (retry once it finishes). The close-time fold is best-effort and non-fatal: a failure never turns a clean with block into a raised error, and an exit via exception skips the fold and propagates your error unchanged.

Trade-off: every clean close ends in an fsync. Prefer a single long-lived handle when one-fsync-per-close is a hot-path bottleneck.


Diagnostics

stat = db.stat()
for cs in stat.collections:
    print(cs.name, cs.doc_count, cs.file_size_bytes)
    for idx in cs.indexes:
        print(" ", idx.name, idx.kind, idx.key_paths, idx.status)

# low-level, type-erased dump of a collection's primary B-tree:
for rec in db.dump_raw("orders", max_records=1000):
    rec.id, rec.header, rec.payload   # id, DocumentHeader, raw postcard bytes

Exceptions

All operations raise instances of obj.ObjError (the catch-all base). The sub-exceptions narrow the diagnosis:

Exception When raised
obj.NotFoundError document / collection / index / namespace absent
obj.BusyError lock contention (pager mutex, writer lock, cross-process)
obj.CorruptionError on-disk format / checksum / B-tree invariant violation
obj.IntegrityError Db.integrity_check() found at least one failure
obj.InvalidArgumentError caller-side argument problem (encoding, range, type, schema)
obj.EncryptionError missing / wrong / mismatched encryption key
obj.FeatureUnsupportedError file uses a build-time feature this wheel lacks

Development

python3 -m venv .venv && source .venv/bin/activate
pip install maturin pytest

cd crates/obj-py
maturin develop          # build the cdylib + install editable into the venv
pytest tests/ -v

The dev loop is "edit Rust → maturin developpytest". For a release wheel: maturin build --release writes target/wheels/obj-*.whl.


License

Dual-licensed under MIT or Apache 2.0, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obj_db-1.1.2.tar.gz (731.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

obj_db-1.1.2-cp39-abi3-macosx_11_0_arm64.whl (717.3 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file obj_db-1.1.2.tar.gz.

File metadata

  • Download URL: obj_db-1.1.2.tar.gz
  • Upload date:
  • Size: 731.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.3

File hashes

Hashes for obj_db-1.1.2.tar.gz
Algorithm Hash digest
SHA256 15e9509fb45274536be27cbe29b698ac9a5c736e732df311216c5c38459a51b0
MD5 a80457f1edc4ba9baadf8683b52332fc
BLAKE2b-256 29ddc97eba72720befd8b9a83fdaa00e302c70578f06c13e54be5a09575147fe

See more details on using hashes here.

File details

Details for the file obj_db-1.1.2-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for obj_db-1.1.2-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a12d3bc617d9df3d711aa1443846c25cf270b4acb75a3a4678fb1309b139dbf9
MD5 c9e903b7e3e29d96244db4efa799df3c
BLAKE2b-256 ec9d5c10ded5d141cd794a8fc3db30cf3c715e13f8df9f27ca58c46193c78865

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page