Skip to main content

A SQLite-backed embedded persistent dictionary store for Python

Project description

pydbdict

pydbdict is a SQLite-backed embedded persistent dictionary store for Python. It stores JSON-Schema-validated dictionaries with primary keys, secondary indexes, foreign-key constraints, ACID transactions, and expression queries.

Use pydbdict when an application needs durable local storage with stronger structure than a JSON file or cache, but does not need to run a separate database server.

Contents

Features

  • Embedded SQLite storage: no server process to install or operate.
  • JSON Schema validation for every stored dictionary.
  • Single-field and composite primary keys.
  • Secondary indexes for exact-match reads and expression queries.
  • Foreign-key constraints between schemas.
  • Foreign-key delete behavior including abort and cascade.
  • ACID transactions through a Python context manager.
  • Python-style filter expressions with comparison operators, and, or, and parentheses.
  • Iterator mode for streaming query results without materializing the full result set.
  • Indexed row-id bitmap execution for compound filters.
  • Pure Python package with SQLite from the Python standard library.

Get Started

Install

Install pydbdict into your application environment:

python -m pip install pydbdict

SQLite is provided by Python's standard library. pyroaring and jsonschema are installed as Python package dependencies.

If you are using the source code from GitHub or another downloaded copy of this repository, first open a terminal in that repository directory:

cd pydbdict
python -m venv .venv
. .venv/bin/activate
python -m pip install -r requirement.txt

On Windows PowerShell:

cd pydbdict
py -3.12 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -r requirement.txt

Verify The Install

After installing with pip, run a small smoke test:

python -m pydbdict.selftest

Expected output:

pydbdict self-test ok

The installed wheel includes the starter pydbdict/etc JSON files, the pydbdict.selftest module, and the bundled test resources under pydbdict/tests.

For a fuller installed-package check, run:

python -m pydbdict.tests

It should end with:

Ran 26 tests

OK
pydbdict bundled tests ok

If you are using the source repository, run the full bundled tests after installing dependencies:

python test/test_ast_parser.py
python test/test_pydbdict.py

The parser test should end with:

Ran 8 tests

OK

The backend test should end with:

Ran 26 tests

OK

test/test_pydbdict.py also prints performance timings after the test result.

test/test_pydbdict.py uses the sample schemas and generated test data under test/. Those files are test fixtures; you do not need to copy them into your own application project. Passing this test confirms that pydbdict can initialize, load schemas, validate records, query data, and run transactions.

Initialize A Project

A pydbdict application should keep its runtime config and schema definitions in the application's project directory:

my_project/
|-- etc/
|   |-- app_config_schema.json
|   |-- dd_meta_schema.json
|   |-- app_config.xnix.json
|   `-- app_config.win.json
|-- schemas/
|   `-- your_schema.json
`-- app.py

Create the directories:

mkdir -p etc schemas var/pydbdict

On Windows PowerShell:

mkdir etc, schemas

Copy the starter etc files into your application's etc/ directory.

If pydbdict is installed with pip, copy from the installed package's pydbdict/etc directory. Run this once from your project root:

from pathlib import Path
import shutil
import pydbdict

source = Path(pydbdict.__file__).parent / "etc"
target = Path("etc")
target.mkdir(exist_ok=True)

for name in (
    "app_config_schema.json",
    "dd_meta_schema.json",
    "app_config.xnix.json",
    "app_config.win.json",
):
    shutil.copy2(source / name, target / name)

If you are working from the source repository, you can copy directly from its top-level etc/ directory instead:

cp /path/to/pydbdict/etc/*.json etc/

On Windows PowerShell:

copy C:\path\to\pydbdict\etc\*.json etc\

Run your application from the project root so relative paths in config and schemas resolve as expected:

cd my_project
python app.py

Minimal App

from pydbdict import init

init()

After init(), register your own schemas and use the APIs described below.

Run In Docker

pydbdict applications can run entirely inside Docker. Build a runtime image from a locally built wheel:

./release.sh 0.1.0a1

docker build \
    -t pydbdict-runtime:0.1.0a1 \
    -f docker/runtime/Dockerfile .

Run an application project by mounting it at /app:

mkdir -p my_project/etc my_project/schemas my_project/var/pydbdict

Copy starter etc files from the image into the mounted project:

docker run --rm \
    -v "$PWD/my_project":/app -w /app \
    pydbdict-runtime:0.1.0a1 \
    python - <<'PY'
from pathlib import Path
import shutil
import pydbdict

source = Path(pydbdict.__file__).parent / "etc"
target = Path("etc")
target.mkdir(exist_ok=True)

for name in (
    "app_config_schema.json",
    "dd_meta_schema.json",
    "app_config.xnix.json",
    "app_config.win.json",
):
    shutil.copy2(source / name, target / name)
PY

Create a minimal my_project/app.py:

cat > my_project/app.py <<'PY'
from pathlib import Path
from pydbdict import init, schema as scm

init()
print("schema_count:", scm.count())
print("db_exists:", Path("var/pydbdict/pydbdict.sqlite3").exists())
PY

Run it in the container:

docker run --rm --user "$(id -u):$(id -g)" \
    -v "$PWD/my_project":/app -w /app \
    pydbdict-runtime:0.1.0a1 \
    python app.py

Keep etc/app_config.xnix.json inside the mounted project and set env_home to a mounted path such as ./var/pydbdict, so the SQLite database persists on the host. The minimal app should print:

schema_count: 0
db_exists: True

After it runs, the host project should contain my_project/var/pydbdict/pydbdict.sqlite3.

See docker/README.md for the complete Docker workflow.

Project Files

etc/ contains two validation schemas and two platform-specific runtime config files:

File Purpose Edit?
etc/dd_meta_schema.json Meta schema used to validate your data schema files such as person.json. No
etc/app_config_schema.json Schema used to validate pydbdict runtime config. No
etc/app_config.xnix.json Runtime config for macOS and Linux. Yes, on macOS/Linux
etc/app_config.win.json Runtime config for Windows. Yes, on Windows

init() loads platform-specific config from the current working directory:

  • etc/app_config.xnix.json on macOS and Linux
  • etc/app_config.win.json on Windows

Do not edit dd_meta_schema.json or app_config_schema.json for normal application use. Edit only the app config file for the platform where the application runs.

The schema files under schemas/ define the record types your application stores.

Configuration

macOS And Linux

On macOS and Linux, edit etc/app_config.xnix.json:

{
    "$schema": "./app_config_schema.json",
    "env_home": "./var/pydbdict",
    "verbose": "info",
    "err_file": "log.txt"
}

Windows

On Windows, edit etc/app_config.win.json:

{
    "$schema": "./app_config_schema.json",
    "env_home": "C:/pydbdict_data/my_project",
    "verbose": "info",
    "err_file": "log.txt"
}

Important fields:

  • env_home: directory containing pydbdict.sqlite3; init() creates it if needed.
  • verbose: one of error, warning, info, or debug.
  • err_file: log file. Relative paths are resolved under env_home.

On Windows, prefer a real local Windows path such as C:/... for env_home.

Defining Schemas

pydbdict schemas are JSON Schema documents plus a few database-oriented extensions. Every schema should be validated by etc/dd_meta_schema.json. When your schema files live under schemas/, use this reference:

"$schema": "../etc/dd_meta_schema.json"

The meta schema requires:

  • type: currently always "object".
  • properties: field definitions.
  • required: fields required on every inserted record.
  • additionalProperties: usually false, so records cannot contain undeclared fields.
  • pk: primary-key field name or list of field names.
  • version: positive integer used to invalidate cached validators after schema changes.

pydbdict-specific extensions include:

  • pk: a string for a single-field primary key, or an array for a composite primary key.
  • fk: a foreign-key constraint on one field.
  • flag: foreign-key behavior. abort rejects invalid changes; cascade cascades delete behavior where supported by the backend.
  • format: numeric and semantic formats such as uint8, uint16, uint32, date, datetime, and uuid.
  • join_index and foreign_keys: optional advanced composite-index and composite-foreign-key definitions.

The examples below come from the repository's test fixtures: test/nationality.json, test/person.json, and test/salary.json. They are shown to explain schema syntax; they are not required files in your project.

nationality.json

nationality is the parent table in the example model. It has one string primary key, abr, and a display name:

{
    "$schema": "../etc/dd_meta_schema.json",
    "type": "object",
    "properties": {
        "abr": {
            "type": "string",
            "pattern": "[A-Z][A-Z]"
        },
        "name": {
            "type": "string"
        }
    },
    "pk": ["abr"],
    "required": ["abr", "name"],
    "additionalProperties": false,
    "version": 1
}

Notes:

  • abr is the primary key.
  • pattern restricts the abbreviation to two uppercase letters.
  • additionalProperties: false rejects records with undeclared fields.

person.json

person stores people and references nationality:

{
    "$schema": "../etc/dd_meta_schema.json",
    "type": "object",
    "properties": {
        "id": {
            "type": "integer",
            "format": "uint32"
        },
        "last_name": {
            "type": "string"
        },
        "first_name": {
            "type": "string"
        },
        "dob": {
            "type": "string",
            "format": "date"
        },
        "gender": {
            "type": "string",
            "enum": ["male", "female"]
        },
        "nationality": {
            "type": "string",
            "fk": {
                "constrained_by": "nationality",
                "flag": "abort"
            }
        }
    },
    "required": ["id", "last_name", "first_name", "dob", "gender"],
    "pk": ["id"],
    "additionalProperties": false,
    "version": 1
}

Notes:

  • id is the primary key.
  • gender is validated against an enum.
  • nationality.fk.constrained_by points to the nationality schema.
  • flag: "abort" means pydbdict rejects a person record whose nationality does not exist.

Load nationality before person, because person has a foreign key to it.

salary.json

salary stores monthly salary records and references person:

{
    "$schema": "../etc/dd_meta_schema.json",
    "type": "object",
    "properties": {
        "emp_id": {
            "type": "integer",
            "format": "uint32",
            "fk": {
                "constrained_by": "person",
                "flag": "abort"
            }
        },
        "year": {
            "type": "integer",
            "format": "uint16"
        },
        "month": {
            "type": "integer",
            "format": "uint8",
            "minimum": 1,
            "maximum": 12
        },
        "salary": {
            "type": "number"
        }
    },
    "required": ["emp_id", "year", "month", "salary"],
    "pk": ["emp_id", "year", "month"],
    "additionalProperties": false,
    "version": 1
}

Notes:

  • pk is composite: one employee has one salary record per year/month.
  • emp_id references person.
  • month is constrained to 1..12.
  • salary uses JSON Schema's number type.

Load salary after person.

Loading Schemas

Load your schemas one by one:

from pydbdict import schema as scm

scm.add_from_file("customer", "schemas/customer.json")
scm.add_from_file("invoice", "schemas/invoice.json")

Or load a directory. pydbdict sorts schemas so dependencies are registered before schemas that reference them:

scm.import_from_dir("schemas")

Programming Guide

Imports And Initialization

from pydbdict import init, schema as scm, data, Transaction as txn

init()

Call init() once before schema or data operations. Normal pydbdict API calls do not print to stdout. The examples below use print() only where seeing the return value helps.

Schema API

scm.add_from_file("customer", "schemas/customer.json")
scm.add_from_dict("customer", schema_dict)
scm.get("customer")
scm.get_all()
scm.count()
scm.rm("customer")
scm.rm_all()
scm.import_from_dir("schemas")
scm.export_to_dir("schemas-backup")

Successful schema registration returns None. Inspection calls return data:

print(scm.add_from_file("nationality", "schemas/nationality.json"))
print(scm.count())

In a new database after loading that one schema:

None
1

Insert Records

Insert one dictionary:

data.add("nationality", {"abr": "CN", "name": "China"})

Successful inserts return None. Use data.count() to confirm what was stored:

print(data.add("nationality", {"abr": "CN", "name": "China"}))
print(data.count("nationality"))
None
1

Insert many dictionaries:

data.add("person", [person1, person2, person3])

pydbdict validates records against the registered schema before writing them.

Read Records

Use data.get() for exact field matching:

[person] = data.get("person", {"id": 1})
cn_people = data.get("person", {"nationality": "CN"})

data.get() returns a list:

print(data.get("nationality", {"abr": "CN"}))
[{'abr': 'CN', 'name': 'China'}]

Use data.query() for expression filters:

rows = data.query("salary", filter="salary > 8000 and year == 2026")
rows = data.query("salary", filter="salary > 8000 and (month <= 6 or month == 12)")

data.query() also returns a list by default:

print(data.query("nationality", filter="abr == 'CN'"))
[{'abr': 'CN', 'name': 'China'}]

Iterate Results

Use mode="iterate" when you do not want to materialize all results:

with data.query("salary", filter="salary > 9000", mode="iterate") as rows:
    for row in rows:
        print(row)

The same mode works with exact-match reads:

with data.get("salary", {"year": 2026}, mode="iterate") as rows:
    for row in rows:
        print(row)

Delete And Count

data.count("person")
data.rm("person", {"id": 1})
data.rm_all("salary")

data.count() returns an integer. data.rm() returns 1 when it removes one record, and data.rm_all() returns the number of records removed:

print(data.count("nationality"))
print(data.rm("nationality", {"abr": "CN"}))
print(data.count("nationality"))
1
1
0

data.rm() takes primary-key fields. For a composite primary key, pass all primary-key fields:

data.rm("salary", {"emp_id": 1, "year": 2026, "month": 5})

Transactions

Use Transaction as a context manager. The recommended alias is txn:

with txn():
    data.add("person", {
        "id": 2,
        "last_name": "Wang",
        "first_name": "Wu",
        "dob": "1990-01-01",
        "gender": "male",
        "nationality": "CN",
    })
    data.add("salary", {
        "emp_id": 2,
        "year": 2026,
        "month": 5,
        "salary": 15000.0,
    })

If the block exits normally, the transaction commits. If an exception leaves the block, the transaction aborts. Schema and data operations on the same thread automatically join the active transaction.

Query Model

data.query() accepts Python-style boolean expressions over schema fields:

data.query("salary", filter="salary < 8000")
data.query("salary", filter="salary >= 5000 and salary <= 9000")
data.query("salary", filter="year == 2026 or month == 12")
data.query("salary", filter="salary != 0 and (year == 2025 or year == 2026)")

Supported operators include ==, !=, <, <=, >, >=, and, or, and parentheses.

Internally, pydbdict parses the expression into disjunctive normal form, runs indexed SQLite row-id queries for each leaf predicate, combines row IDs with Roaring bitmap operations, and loads only matching JSON payloads.

Limitations

pydbdict is an embedded local store, not a client/server database. It works best when one application process owns the database files and uses pydbdict as a durable local dictionary layer.

  • Writes are serialized. The SQLite backend uses one connection per process and a process-local lock around transactions. SQLite WAL mode allows readers and a writer to coexist, but SQLite still permits only one writer at a time.
  • Multi-process writes are not the primary target. SQLite's file locking can coordinate separate processes, but pydbdict's transaction stack and in-memory connection state are process-local. Prefer one writer process, or add an application-level write queue if several processes need to modify the same database.
  • C/S or B/S backends should use a single writer process or an application-level write queue. Multi-worker deployments such as gunicorn -w N, uvicorn --workers N, or multiple container replicas can create cross-process write contention because pydbdict's lock is process-local and SQLite allows only one writer at a time.
  • Query results are materialized in Python. mode="iterate" streams through a pydbdict iterator interface, but the current SQLite backend still collects matching rows before yielding them. Avoid queries that intentionally return very large portions of a large database.
  • Compound expression queries use bitmap sets of matching row IDs before loading JSON payloads. This is fast for selective indexed predicates, but broad predicates such as salary > 0 or field != value can approach a large scan and require memory proportional to the match count.
  • Records are stored as JSON payloads with generated SQLite columns for declared fields. This is convenient for dictionary data, but it is not a replacement for hand-tuned relational schemas, ad hoc SQL joins, or analytical workloads.
  • data.query() currently supports comparisons, and, or, and parentheses. Parameters such as select, expand, top, skip, and orderby are reserved for future use and are ignored today.
  • Schema structure changes are limited. Updating a schema version is fine for validator refreshes, but changing fields, primary keys, indexes, or foreign keys requires an explicit data migration.
  • Very large databases should be tested with your own data shape. As a rough guide, pydbdict is aimed at local application datasets where indexed queries return manageable result sets, not at multi-user server workloads or terabyte-scale analytics.

License

pydbdict is distributed under the MIT License. See LICENSE for the full license text.

Third-party dependency release notes are tracked in THIRD_PARTY_NOTICES.md. The default wheel is pure Python: it does not bundle SQLite, jsonschema, or pyroaring source code. SQLite is provided by Python's standard library, while jsonschema and pyroaring are installed as PyPI dependencies.

Development

This section is for contributors and maintainers, not application users.

Test Matrix

Current SQLite-backend test coverage:

Platform Environment Status
macOS Native, Python 3.12, SQLite 3.49, pyroaring 1.1 Passed
Windows 11 Native, Python 3.12 Passed
Debian 12 Native, Python 3.12 Passed
Ubuntu 24.04 LTS Docker, docker/sqlite/ubuntu24.04/Dockerfile Passed
Fedora 44 Docker, docker/sqlite/fedora44/Dockerfile Passed

The Docker runner executes both parser tests and the SQLite backend test suite.

Local Tests

python -m pip install -r requirement.txt
python test/test_ast_parser.py
python test/test_pydbdict.py

Build Artifacts

The release helper updates pyproject.toml, builds both distribution files, runs twine check, and prints the generated artifacts and SHA-256 checksums:

python -m pip install build twine
./release.sh 0.1.2

release.sh uses the active python or python3 on PATH. To force a specific interpreter, set PYTHON:

PYTHON="$(which python)" ./release.sh 0.1.2

The generated files are written under dist/:

pydbdict-0.1.2-py3-none-any.whl
pydbdict-0.1.2.tar.gz

Build a local wheel manually:

python -m pip install setuptools wheel "packaging>=24.2"
python -m pip wheel . --no-deps --no-build-isolation -w dist/

Build a source distribution:

python setup.py sdist -d dist/

The wheel is pure Python. pip resolves the pyroaring wheel appropriate for the user's Python version and platform.

Project Layout

.
|-- pydbdict/              Python package and public API
|   |-- __init__.py        Top-level imports: init, schema, data, Transaction
|   |-- data.py            Record insert/read/query/delete API
|   |-- schemas.py         Schema import/export and constraint management
|   |-- transaction.py     Transaction context manager
|   |-- pydbdict.py        Runtime initialization
|   |-- sqlite_backend.py  SQLite storage and bitmap query backend
|   `-- pydb_loader.py     Backend loader
|-- test/                  Unit tests, sample schemas, and generated test data
|-- etc/                   Application config and pydbdict schema metadata
|-- docs/                  Documentation source
|-- docker/                SQLite test environments
|-- requirement.txt        Python development dependencies
|-- setup.py               Packaging entry point
`-- MANIFEST.in            Packaging data-file rules

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydbdict-0.1.0a1.tar.gz (404.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydbdict-0.1.0a1-py3-none-any.whl (403.5 kB view details)

Uploaded Python 3

File details

Details for the file pydbdict-0.1.0a1.tar.gz.

File metadata

  • Download URL: pydbdict-0.1.0a1.tar.gz
  • Upload date:
  • Size: 404.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for pydbdict-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 d35d01b129e3189e2469f403bbb8d7c025a013f554d8618d743330e8ac2de1ff
MD5 07b034a401f5ee2f3f2d807ed71d7974
BLAKE2b-256 b4b2dd160c2c00a1af3b797c334e05182d8b1ee88e296b0589b640cae6a1a2e6

See more details on using hashes here.

File details

Details for the file pydbdict-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: pydbdict-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 403.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for pydbdict-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 2543a6cc67c3aac8c32bfa97124061fa635d680b52a2bf05aa7d6ea06454a5cd
MD5 4c3c236ff48316dc18f0ced08590192f
BLAKE2b-256 06fc3947b9495c64de0ea9518b851a5c0fe1ef205a778d0ffcb604cbfe8078f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page