Python binding for the obj embedded document database.
Project description
obj-db (Python)
The embedded document database for Python. Dependable. Portable. Zero-infrastructure.
Part of obj — a self-contained,
serverless, single-file document database with a stable file format and
full ACID semantics. The wheel is at parity with the Rust surface and
writes a byte-identical file format.
Wheel obj-db on PyPI; import as obj. Built with PyO3 (abi3-py39).
pip install obj-db
Quickstart
Wrap a @dataclass with @obj.document for the typed, ergonomic API.
The codec produces postcard bytes byte-identical to Rust's
#[derive(Document)] for the same schema.
from dataclasses import dataclass
import obj
@obj.document(collection="orders", version=1)
@dataclass
class Order:
customer_id: int
total: float
status: str
with obj.Db("app.obj") as db:
doc_id = db.insert(Order(customer_id=1, total=99.5, status="pending"))
order = db.get(Order, doc_id)
for (oid, o) in db.all(Order):
...
Three write surfaces
The same CRUD methods (insert / get / update / upsert / delete
/ all) dispatch by argument type:
- Typed documents — pass an
@obj.documentinstance or class. Routes through the schema-driven codec; on-disk bytes match Rust. - Dict-native — pass a collection
strplus adictfor ad-hoc writes with no@documentboilerplate:db.insert("events", {...}). - Raw bytes — pass a collection
strplusbytes. obj does not serialise for you on this path; encode however you like (json, msgpack, postcard, pickle). Mirrors the obj C ABI contract.
db.update(Cls, id, fn) is an atomic read-modify-write inside one
transaction (no lost-update window); a raising fn rolls it back.
Per-document lazy migration mirrors Rust's Migrate trait via a
history=[...] arg and a cls.migrate(doc, from_version) classmethod.
obj's encoding is positional (schema-driven, no field names in the bytes) and the schema registry is process-global per
(collection, version). Two handles declaring the same collection with a different shape raiseobj.InvalidArgumentErrorrather than risk a silent mis-encode — use distinct names or bumpversion=.
Transactions
WriteTxn batches many typed writes into a single commit / single WAL
fsync, and reads see the transaction's own uncommitted writes:
with obj.Db("app.obj") as db:
with db.transaction() as tx:
for i in range(1000):
tx.insert(Order(customer_id=i, total=float(i), status="new"))
tx.update(Order, 1, lambda o: setattr(o, "status", "shipped"))
tx.insert("audit_log", b"<raw bytes>") # raw overload still works
# one commit + one fsync on __exit__
tx.collection(Cls) binds a class once and exposes typed CRUD scoped to
the transaction.
Secondary indexes
Declare indexes with typing.Annotated markers — obj.Index,
obj.Unique, obj.Each (multi-value, on a list[...] field) — plus an
indexes=[obj.Composite((...), name=...)] decorator arg. This mirrors
Rust's #[obj(index ...)] attributes; the index B-trees are maintained
on every write.
from typing import Annotated
@obj.document(
collection="orders", version=1,
indexes=[obj.Composite(("region", "status"), name="by_region_status")],
)
@dataclass
class Order:
email: Annotated[str, obj.Unique] # unique index
region: Annotated[str, obj.Index] # standard index
tags: Annotated[list[str], obj.Each] # multi-value index
status: str = "new"
with obj.Db("app.obj") as db:
db.insert(Order(email="a@b.com", region="us", tags=["vip"]))
order = db.find_unique(Order, "email", "a@b.com") # exact lookup
for oid, o in db.index_range(Order, "region", "us", "us"): # half-open range
...
A duplicate Unique key raises obj.InvalidArgumentError and rolls the
write back atomically.
Querying
db.query(...) returns a lazy, immutable builder (each call returns a
fresh Query). Pass an @obj.document class for typed results or a
collection str for dict-native results:
top = (db.query(Order)
.filter(lambda o: o.status == "shipped") # AND-combined predicates
.sort_by(lambda o: o.total)
.limit(10)
.fetch()) # -> list[tuple[int, Order]]
count = db.query(Order).filter(lambda o: o.total > 100).count() # no-decode fast path
us = db.query(Order).index_range("region", "us", "us").fetch()
The sort buffer is bounded (obj.MAX_SORT_BUFFER, overridable per query
with .sort_buffer_limit(n)); an over-cap sort raises rather than
allocating without limit.
Multi-file attach
Open another .obj file's collections read-only under a namespace,
addressed as "namespace.collection":
with obj.Db("app.obj") as db:
db.attach("archive.obj", "archive")
archived = list(db.all("archive.orders")) # also db.get / db.query
db.detach("archive")
Writes to a namespaced collection raise obj.InvalidArgumentError. A
namespaced read needs its schema registered under the namespaced name
(declare a class with collection="archive.orders", or read raw bytes);
an unregistered read fails loud rather than returning garbage.
Async
obj.AsyncDb mirrors the blocking Db for asyncio callers — a thin
wrapper that offloads each call to a thread executor (the GIL is released
around engine work; no new runtime dependency).
import asyncio, obj
async def main():
adb = await obj.AsyncDb.open("app.obj")
oid = await adb.insert(Order(email="e@f.com", region="us", tags=[]))
async for oid, o in adb.all(Order):
...
async with adb.transaction() as tx: # commits on clean exit
await tx.insert(Order(email="g@h.com", region="eu", tags=[]))
await adb.close()
asyncio.run(main())
Each async transaction pins one worker thread for its lifetime (the txn
handle is not Send), so async with adb.transaction() is safe to drive
op-by-op.
Checkpointing
Writes land in a write-ahead log (<db>.obj-wal) first; the main file
stays sparse until a checkpoint folds committed WAL pages into it. A
checkpoint fires automatically at ~1000 WAL frames, and a clean
close() (including a with block that exits without raising) folds the
WAL for you — so the .obj file is self-contained after normal shutdown.
with obj.Db("app.obj") as db:
for note in notes:
db.insert(note)
db.checkpoint() # fold on demand (optional)
checkpoint() is a no-op when there is nothing to fold, and is deferred
if a concurrent reader has pinned a snapshot below the WAL end (retry
once it finishes). The close-time fold is best-effort and non-fatal: a
failure never turns a clean with block into a raised error, and an exit
via exception skips the fold and propagates your error unchanged.
Trade-off: every clean close ends in an fsync. Prefer a single
long-lived handle when one-fsync-per-close is a hot-path bottleneck.
Diagnostics
stat = db.stat()
for cs in stat.collections:
print(cs.name, cs.doc_count, cs.file_size_bytes)
for idx in cs.indexes:
print(" ", idx.name, idx.kind, idx.key_paths, idx.status)
# low-level, type-erased dump of a collection's primary B-tree:
for rec in db.dump_raw("orders", max_records=1000):
rec.id, rec.header, rec.payload # id, DocumentHeader, raw postcard bytes
Exceptions
All operations raise instances of obj.ObjError (the catch-all base).
The sub-exceptions narrow the diagnosis:
| Exception | When raised |
|---|---|
obj.NotFoundError |
document / collection / index / namespace absent |
obj.BusyError |
lock contention (pager mutex, writer lock, cross-process) |
obj.CorruptionError |
on-disk format / checksum / B-tree invariant violation |
obj.IntegrityError |
Db.integrity_check() found at least one failure |
obj.InvalidArgumentError |
caller-side argument problem (encoding, range, type, schema) |
obj.EncryptionError |
missing / wrong / mismatched encryption key |
obj.FeatureUnsupportedError |
file uses a build-time feature this wheel lacks |
Development
python3 -m venv .venv && source .venv/bin/activate
pip install maturin pytest
cd crates/obj-py
maturin develop # build the cdylib + install editable into the venv
pytest tests/ -v
The dev loop is "edit Rust → maturin develop → pytest". For a
release wheel: maturin build --release writes target/wheels/obj-*.whl.
License
Dual-licensed under MIT or Apache 2.0, at your option.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file obj_db-1.1.2.tar.gz.
File metadata
- Download URL: obj_db-1.1.2.tar.gz
- Upload date:
- Size: 731.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15e9509fb45274536be27cbe29b698ac9a5c736e732df311216c5c38459a51b0
|
|
| MD5 |
a80457f1edc4ba9baadf8683b52332fc
|
|
| BLAKE2b-256 |
29ddc97eba72720befd8b9a83fdaa00e302c70578f06c13e54be5a09575147fe
|
File details
Details for the file obj_db-1.1.2-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: obj_db-1.1.2-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 717.3 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a12d3bc617d9df3d711aa1443846c25cf270b4acb75a3a4678fb1309b139dbf9
|
|
| MD5 |
c9e903b7e3e29d96244db4efa799df3c
|
|
| BLAKE2b-256 |
ec9d5c10ded5d141cd794a8fc3db30cf3c715e13f8df9f27ca58c46193c78865
|