A SQLite-backed embedded persistent dictionary store for Python
Project description
pydbdict
pydbdict is a SQLite-backed embedded persistent dictionary store for Python. It stores JSON-Schema-validated dictionaries with primary keys, secondary indexes, foreign-key constraints, ACID transactions, and expression queries.
Use pydbdict when an application needs durable local storage with stronger structure than a JSON file or cache, but does not need to run a separate database server.
Contents
- Features
- Get Started
- Project Files
- Configuration
- Defining Schemas
- Run In Docker
- Programming Guide
- Query Model
- Limitations
- License
- Development
Features
- Embedded SQLite storage: no server process to install or operate.
- JSON Schema validation for every stored dictionary.
- Single-field and composite primary keys.
- Secondary indexes for exact-match reads and expression queries.
- Foreign-key constraints between schemas.
- Foreign-key delete behavior including
abortandcascade. - ACID transactions through a Python context manager.
- Python-style filter expressions with comparison operators,
and,or, and parentheses. - Iterator mode for streaming query results without materializing the full result set.
- Indexed row-id bitmap execution for compound filters.
- Pure Python package with SQLite from the Python standard library.
Get Started
Install
Install pydbdict into your application environment:
python -m pip install pydbdict
SQLite is provided by Python's standard library. pyroaring and
jsonschema are installed as Python package dependencies.
If you are using the source code from GitHub or another downloaded copy of this repository, first open a terminal in that repository directory:
cd pydbdict
python -m venv .venv
. .venv/bin/activate
python -m pip install -r requirement.txt
On Windows PowerShell:
cd pydbdict
py -3.12 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -r requirement.txt
Verify The Install
After installing with pip, run a small smoke test:
python -m pydbdict.selftest
Expected output:
pydbdict self-test ok
The installed wheel includes the starter pydbdict/etc JSON files, the
pydbdict.selftest module, and the bundled test resources under
pydbdict/tests.
For a fuller installed-package check, run:
python -m pydbdict.tests
It should end with:
Ran 26 tests
OK
pydbdict bundled tests ok
If you are using the source repository, run the full bundled tests after installing dependencies:
python test/test_ast_parser.py
python test/test_pydbdict.py
The parser test should end with:
Ran 8 tests
OK
The backend test should end with:
Ran 26 tests
OK
test/test_pydbdict.py also prints performance timings after the test result.
test/test_pydbdict.py uses the sample schemas and generated test data under
test/. Those files are test fixtures; you do not need to copy them into your
own application project. Passing this test confirms that pydbdict can
initialize, load schemas, validate records, query data, and run transactions.
Initialize A Project
A pydbdict application should keep its runtime config and schema definitions in the application's project directory:
my_project/
|-- etc/
| |-- app_config_schema.json
| |-- dd_meta_schema.json
| |-- app_config.xnix.json
| `-- app_config.win.json
|-- schemas/
| `-- your_schema.json
`-- app.py
Create the directories:
mkdir -p etc schemas var/pydbdict
On Windows PowerShell:
mkdir etc, schemas
Copy the starter etc files into your application's etc/ directory.
If pydbdict is installed with pip, copy from the installed package's
pydbdict/etc directory. Run this once from your project root:
from pathlib import Path
import shutil
import pydbdict
source = Path(pydbdict.__file__).parent / "etc"
target = Path("etc")
target.mkdir(exist_ok=True)
for name in (
"app_config_schema.json",
"dd_meta_schema.json",
"app_config.xnix.json",
"app_config.win.json",
):
shutil.copy2(source / name, target / name)
If you are working from the source repository, you can copy directly from its
top-level etc/ directory instead:
cp /path/to/pydbdict/etc/*.json etc/
On Windows PowerShell:
copy C:\path\to\pydbdict\etc\*.json etc\
Run your application from the project root so relative paths in config and schemas resolve as expected:
cd my_project
python app.py
Minimal App
from pydbdict import init
init()
After init(), register your own schemas and use the APIs described below.
Run In Docker
pydbdict applications can run entirely inside Docker. Build a runtime image from a locally built wheel:
./release.sh 0.1.0a1
docker build \
-t pydbdict-runtime:0.1.0a1 \
-f docker/runtime/Dockerfile .
Run an application project by mounting it at /app:
mkdir -p my_project/etc my_project/schemas my_project/var/pydbdict
Copy starter etc files from the image into the mounted project:
docker run --rm \
-v "$PWD/my_project":/app -w /app \
pydbdict-runtime:0.1.0a1 \
python - <<'PY'
from pathlib import Path
import shutil
import pydbdict
source = Path(pydbdict.__file__).parent / "etc"
target = Path("etc")
target.mkdir(exist_ok=True)
for name in (
"app_config_schema.json",
"dd_meta_schema.json",
"app_config.xnix.json",
"app_config.win.json",
):
shutil.copy2(source / name, target / name)
PY
Create a minimal my_project/app.py:
cat > my_project/app.py <<'PY'
from pathlib import Path
from pydbdict import init, schema as scm
init()
print("schema_count:", scm.count())
print("db_exists:", Path("var/pydbdict/pydbdict.sqlite3").exists())
PY
Run it in the container:
docker run --rm --user "$(id -u):$(id -g)" \
-v "$PWD/my_project":/app -w /app \
pydbdict-runtime:0.1.0a1 \
python app.py
Keep etc/app_config.xnix.json inside the mounted project and set
env_home to a mounted path such as ./var/pydbdict, so the SQLite database
persists on the host. The minimal app should print:
schema_count: 0
db_exists: True
After it runs, the host project should contain
my_project/var/pydbdict/pydbdict.sqlite3.
See docker/README.md for the complete Docker workflow.
Project Files
etc/ contains two validation schemas and two platform-specific runtime config
files:
| File | Purpose | Edit? |
|---|---|---|
etc/dd_meta_schema.json |
Meta schema used to validate your data schema files such as person.json. |
No |
etc/app_config_schema.json |
Schema used to validate pydbdict runtime config. | No |
etc/app_config.xnix.json |
Runtime config for macOS and Linux. | Yes, on macOS/Linux |
etc/app_config.win.json |
Runtime config for Windows. | Yes, on Windows |
init() loads platform-specific config from the current working directory:
etc/app_config.xnix.jsonon macOS and Linuxetc/app_config.win.jsonon Windows
Do not edit dd_meta_schema.json or app_config_schema.json for normal
application use. Edit only the app config file for the platform where the
application runs.
The schema files under schemas/ define the record types your application
stores.
Configuration
macOS And Linux
On macOS and Linux, edit etc/app_config.xnix.json:
{
"$schema": "./app_config_schema.json",
"env_home": "./var/pydbdict",
"verbose": "info",
"err_file": "log.txt"
}
Windows
On Windows, edit etc/app_config.win.json:
{
"$schema": "./app_config_schema.json",
"env_home": "C:/pydbdict_data/my_project",
"verbose": "info",
"err_file": "log.txt"
}
Important fields:
env_home: directory containingpydbdict.sqlite3;init()creates it if needed.verbose: one oferror,warning,info, ordebug.err_file: log file. Relative paths are resolved underenv_home.
On Windows, prefer a real local Windows path such as C:/... for env_home.
Defining Schemas
pydbdict schemas are JSON Schema documents plus a few database-oriented
extensions. Every schema should be validated by etc/dd_meta_schema.json.
When your schema files live under schemas/, use this reference:
"$schema": "../etc/dd_meta_schema.json"
The meta schema requires:
type: currently always"object".properties: field definitions.required: fields required on every inserted record.additionalProperties: usuallyfalse, so records cannot contain undeclared fields.pk: primary-key field name or list of field names.version: positive integer used to invalidate cached validators after schema changes.
pydbdict-specific extensions include:
pk: a string for a single-field primary key, or an array for a composite primary key.fk: a foreign-key constraint on one field.flag: foreign-key behavior.abortrejects invalid changes;cascadecascades delete behavior where supported by the backend.format: numeric and semantic formats such asuint8,uint16,uint32,date,datetime, anduuid.join_indexandforeign_keys: optional advanced composite-index and composite-foreign-key definitions.
The examples below come from the repository's test fixtures:
test/nationality.json, test/person.json, and test/salary.json. They are
shown to explain schema syntax; they are not required files in your project.
nationality.json
nationality is the parent table in the example model. It has one string
primary key, abr, and a display name:
{
"$schema": "../etc/dd_meta_schema.json",
"type": "object",
"properties": {
"abr": {
"type": "string",
"pattern": "[A-Z][A-Z]"
},
"name": {
"type": "string"
}
},
"pk": ["abr"],
"required": ["abr", "name"],
"additionalProperties": false,
"version": 1
}
Notes:
abris the primary key.patternrestricts the abbreviation to two uppercase letters.additionalProperties: falserejects records with undeclared fields.
person.json
person stores people and references nationality:
{
"$schema": "../etc/dd_meta_schema.json",
"type": "object",
"properties": {
"id": {
"type": "integer",
"format": "uint32"
},
"last_name": {
"type": "string"
},
"first_name": {
"type": "string"
},
"dob": {
"type": "string",
"format": "date"
},
"gender": {
"type": "string",
"enum": ["male", "female"]
},
"nationality": {
"type": "string",
"fk": {
"constrained_by": "nationality",
"flag": "abort"
}
}
},
"required": ["id", "last_name", "first_name", "dob", "gender"],
"pk": ["id"],
"additionalProperties": false,
"version": 1
}
Notes:
idis the primary key.genderis validated against an enum.nationality.fk.constrained_bypoints to thenationalityschema.flag: "abort"means pydbdict rejects apersonrecord whose nationality does not exist.
Load nationality before person, because person has a foreign key to it.
salary.json
salary stores monthly salary records and references person:
{
"$schema": "../etc/dd_meta_schema.json",
"type": "object",
"properties": {
"emp_id": {
"type": "integer",
"format": "uint32",
"fk": {
"constrained_by": "person",
"flag": "abort"
}
},
"year": {
"type": "integer",
"format": "uint16"
},
"month": {
"type": "integer",
"format": "uint8",
"minimum": 1,
"maximum": 12
},
"salary": {
"type": "number"
}
},
"required": ["emp_id", "year", "month", "salary"],
"pk": ["emp_id", "year", "month"],
"additionalProperties": false,
"version": 1
}
Notes:
pkis composite: one employee has one salary record per year/month.emp_idreferencesperson.monthis constrained to1..12.salaryuses JSON Schema'snumbertype.
Load salary after person.
Loading Schemas
Load your schemas one by one:
from pydbdict import schema as scm
scm.add_from_file("customer", "schemas/customer.json")
scm.add_from_file("invoice", "schemas/invoice.json")
Or load a directory. pydbdict sorts schemas so dependencies are registered before schemas that reference them:
scm.import_from_dir("schemas")
Programming Guide
Imports And Initialization
from pydbdict import init, schema as scm, data, Transaction as txn
init()
Call init() once before schema or data operations. Normal pydbdict API calls
do not print to stdout. The examples below use print() only where seeing the
return value helps.
Schema API
scm.add_from_file("customer", "schemas/customer.json")
scm.add_from_dict("customer", schema_dict)
scm.get("customer")
scm.get_all()
scm.count()
scm.rm("customer")
scm.rm_all()
scm.import_from_dir("schemas")
scm.export_to_dir("schemas-backup")
Successful schema registration returns None. Inspection calls return data:
print(scm.add_from_file("nationality", "schemas/nationality.json"))
print(scm.count())
In a new database after loading that one schema:
None
1
Insert Records
Insert one dictionary:
data.add("nationality", {"abr": "CN", "name": "China"})
Successful inserts return None. Use data.count() to confirm what was
stored:
print(data.add("nationality", {"abr": "CN", "name": "China"}))
print(data.count("nationality"))
None
1
Insert many dictionaries:
data.add("person", [person1, person2, person3])
pydbdict validates records against the registered schema before writing them.
Read Records
Use data.get() for exact field matching:
[person] = data.get("person", {"id": 1})
cn_people = data.get("person", {"nationality": "CN"})
data.get() returns a list:
print(data.get("nationality", {"abr": "CN"}))
[{'abr': 'CN', 'name': 'China'}]
Use data.query() for expression filters:
rows = data.query("salary", filter="salary > 8000 and year == 2026")
rows = data.query("salary", filter="salary > 8000 and (month <= 6 or month == 12)")
data.query() also returns a list by default:
print(data.query("nationality", filter="abr == 'CN'"))
[{'abr': 'CN', 'name': 'China'}]
Iterate Results
Use mode="iterate" when you do not want to materialize all results:
with data.query("salary", filter="salary > 9000", mode="iterate") as rows:
for row in rows:
print(row)
The same mode works with exact-match reads:
with data.get("salary", {"year": 2026}, mode="iterate") as rows:
for row in rows:
print(row)
Delete And Count
data.count("person")
data.rm("person", {"id": 1})
data.rm_all("salary")
data.count() returns an integer. data.rm() returns 1 when it removes one
record, and data.rm_all() returns the number of records removed:
print(data.count("nationality"))
print(data.rm("nationality", {"abr": "CN"}))
print(data.count("nationality"))
1
1
0
data.rm() takes primary-key fields. For a composite primary key, pass all
primary-key fields:
data.rm("salary", {"emp_id": 1, "year": 2026, "month": 5})
Transactions
Use Transaction as a context manager. The recommended alias is txn:
with txn():
data.add("person", {
"id": 2,
"last_name": "Wang",
"first_name": "Wu",
"dob": "1990-01-01",
"gender": "male",
"nationality": "CN",
})
data.add("salary", {
"emp_id": 2,
"year": 2026,
"month": 5,
"salary": 15000.0,
})
If the block exits normally, the transaction commits. If an exception leaves the block, the transaction aborts. Schema and data operations on the same thread automatically join the active transaction.
Query Model
data.query() accepts Python-style boolean expressions over schema fields:
data.query("salary", filter="salary < 8000")
data.query("salary", filter="salary >= 5000 and salary <= 9000")
data.query("salary", filter="year == 2026 or month == 12")
data.query("salary", filter="salary != 0 and (year == 2025 or year == 2026)")
Supported operators include ==, !=, <, <=, >, >=, and, or, and
parentheses.
Internally, pydbdict parses the expression into disjunctive normal form, runs indexed SQLite row-id queries for each leaf predicate, combines row IDs with Roaring bitmap operations, and loads only matching JSON payloads.
Limitations
pydbdict is an embedded local store, not a client/server database. It works best when one application process owns the database files and uses pydbdict as a durable local dictionary layer.
- Writes are serialized. The SQLite backend uses one connection per process and a process-local lock around transactions. SQLite WAL mode allows readers and a writer to coexist, but SQLite still permits only one writer at a time.
- Multi-process writes are not the primary target. SQLite's file locking can coordinate separate processes, but pydbdict's transaction stack and in-memory connection state are process-local. Prefer one writer process, or add an application-level write queue if several processes need to modify the same database.
- C/S or B/S backends should use a single writer process or an
application-level write queue. Multi-worker deployments such as
gunicorn -w N,uvicorn --workers N, or multiple container replicas can create cross-process write contention because pydbdict's lock is process-local and SQLite allows only one writer at a time. - Query results are materialized in Python.
mode="iterate"streams through a pydbdict iterator interface, but the current SQLite backend still collects matching rows before yielding them. Avoid queries that intentionally return very large portions of a large database. - Compound expression queries use bitmap sets of matching row IDs before
loading JSON payloads. This is fast for selective indexed predicates, but
broad predicates such as
salary > 0orfield != valuecan approach a large scan and require memory proportional to the match count. - Records are stored as JSON payloads with generated SQLite columns for declared fields. This is convenient for dictionary data, but it is not a replacement for hand-tuned relational schemas, ad hoc SQL joins, or analytical workloads.
data.query()currently supports comparisons,and,or, and parentheses. Parameters such asselect,expand,top,skip, andorderbyare reserved for future use and are ignored today.- Schema structure changes are limited. Updating a schema version is fine for validator refreshes, but changing fields, primary keys, indexes, or foreign keys requires an explicit data migration.
- Very large databases should be tested with your own data shape. As a rough guide, pydbdict is aimed at local application datasets where indexed queries return manageable result sets, not at multi-user server workloads or terabyte-scale analytics.
License
pydbdict is distributed under the MIT License. See LICENSE for the full
license text.
Third-party dependency release notes are tracked in THIRD_PARTY_NOTICES.md.
The default wheel is pure Python: it does not bundle SQLite, jsonschema, or
pyroaring source code. SQLite is provided by Python's standard library, while
jsonschema and pyroaring are installed as PyPI dependencies.
Development
This section is for contributors and maintainers, not application users.
Test Matrix
Current SQLite-backend test coverage:
| Platform | Environment | Status |
|---|---|---|
| macOS | Native, Python 3.12, SQLite 3.49, pyroaring 1.1 | Passed |
| Windows 11 | Native, Python 3.12 | Passed |
| Debian 12 | Native, Python 3.12 | Passed |
| Ubuntu 24.04 LTS | Docker, docker/sqlite/ubuntu24.04/Dockerfile |
Passed |
| Fedora 44 | Docker, docker/sqlite/fedora44/Dockerfile |
Passed |
The Docker runner executes both parser tests and the SQLite backend test suite.
Local Tests
python -m pip install -r requirement.txt
python test/test_ast_parser.py
python test/test_pydbdict.py
Build Artifacts
The release helper updates pyproject.toml, builds both distribution files,
runs twine check, and prints the generated artifacts and SHA-256 checksums:
python -m pip install build twine
./release.sh 0.1.2
release.sh uses the active python or python3 on PATH. To force a
specific interpreter, set PYTHON:
PYTHON="$(which python)" ./release.sh 0.1.2
The generated files are written under dist/:
pydbdict-0.1.2-py3-none-any.whl
pydbdict-0.1.2.tar.gz
Build a local wheel manually:
python -m pip install setuptools wheel "packaging>=24.2"
python -m pip wheel . --no-deps --no-build-isolation -w dist/
Build a source distribution:
python setup.py sdist -d dist/
The wheel is pure Python. pip resolves the pyroaring wheel appropriate for
the user's Python version and platform.
Project Layout
.
|-- pydbdict/ Python package and public API
| |-- __init__.py Top-level imports: init, schema, data, Transaction
| |-- data.py Record insert/read/query/delete API
| |-- schemas.py Schema import/export and constraint management
| |-- transaction.py Transaction context manager
| |-- pydbdict.py Runtime initialization
| |-- sqlite_backend.py SQLite storage and bitmap query backend
| `-- pydb_loader.py Backend loader
|-- test/ Unit tests, sample schemas, and generated test data
|-- etc/ Application config and pydbdict schema metadata
|-- docs/ Documentation source
|-- docker/ SQLite test environments
|-- requirement.txt Python development dependencies
|-- setup.py Packaging entry point
`-- MANIFEST.in Packaging data-file rules
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydbdict-0.1.0a1.tar.gz.
File metadata
- Download URL: pydbdict-0.1.0a1.tar.gz
- Upload date:
- Size: 404.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d35d01b129e3189e2469f403bbb8d7c025a013f554d8618d743330e8ac2de1ff
|
|
| MD5 |
07b034a401f5ee2f3f2d807ed71d7974
|
|
| BLAKE2b-256 |
b4b2dd160c2c00a1af3b797c334e05182d8b1ee88e296b0589b640cae6a1a2e6
|
File details
Details for the file pydbdict-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: pydbdict-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 403.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2543a6cc67c3aac8c32bfa97124061fa635d680b52a2bf05aa7d6ea06454a5cd
|
|
| MD5 |
4c3c236ff48316dc18f0ced08590192f
|
|
| BLAKE2b-256 |
06fc3947b9495c64de0ea9518b851a5c0fe1ef205a778d0ffcb604cbfe8078f3
|