Skip to main content

Ephemeral SQL index over a local directory

Project description

dirsql (Python SDK)

Ephemeral SQL index over a local directory. Watches a filesystem, ingests structured files into an in-memory SQLite database, and exposes a SQL query interface. The database is purely in-memory -- the filesystem is always the source of truth.

Documentation

Installation

pip install dirsql

Requires Python >= 3.12. Ships as a native extension (Rust via PyO3) -- binary wheels are provided for common platforms.

Each wheel also bundles the dirsql HTTP-server CLI as a console script, so pip install dirsql also gives you a dirsql command on $PATH. See the CLI guide.

Publishing (maintainers)

Handled by .github/workflows/publish.yml (invoked from minor-release.yml / patch-release.yml). For each target triple the build job cargo builds the Rust CLI with --features cli, stages the binary into python/dirsql/_binary/, runs maturin build (which picks the binary up via the [tool.maturin] include rule in pyproject.toml), and the wheels + sdist are then trusted-published to PyPI.

Quick Start

import asyncio
import json
import os
import tempfile
from dirsql import DirSQL, Table

async def main():
    # Create some data files
    root = tempfile.mkdtemp()
    os.makedirs(os.path.join(root, "comments", "abc"), exist_ok=True)
    os.makedirs(os.path.join(root, "comments", "def"), exist_ok=True)

    with open(os.path.join(root, "comments", "abc", "index.jsonl"), "w") as f:
        f.write(json.dumps({"body": "looks good", "author": "alice"}) + "\n")
        f.write(json.dumps({"body": "needs work", "author": "bob"}) + "\n")

    with open(os.path.join(root, "comments", "def", "index.jsonl"), "w") as f:
        f.write(json.dumps({"body": "agreed", "author": "carol"}) + "\n")

    # Define a table: DDL, glob pattern, and an extract function
    db = DirSQL(
        root,
        tables=[
            Table(
                ddl="CREATE TABLE comments (id TEXT, body TEXT, author TEXT)",
                glob="comments/**/index.jsonl",
                extract=lambda path, content: [
                    {
                        "id": os.path.basename(os.path.dirname(path)),
                        "body": row["body"],
                        "author": row["author"],
                    }
                    for line in content.splitlines()
                    for row in [json.loads(line)]
                ],
            ),
        ],
    )
    await db.ready()

    # Query with SQL
    results = await db.query("SELECT * FROM comments WHERE author = 'alice'")
    # [{"id": "abc", "body": "looks good", "author": "alice"}]

asyncio.run(main())

Multiple Tables and Joins

db = DirSQL(
    root,
    tables=[
        Table(
            ddl="CREATE TABLE posts (title TEXT, author_id TEXT)",
            glob="posts/*.json",
            extract=lambda path, content: [json.loads(content)],
        ),
        Table(
            ddl="CREATE TABLE authors (id TEXT, name TEXT)",
            glob="authors/*.json",
            extract=lambda path, content: [json.loads(content)],
        ),
    ],
)
await db.ready()

results = await db.query("""
    SELECT posts.title, authors.name
    FROM posts JOIN authors ON posts.author_id = authors.id
""")

Ignoring Files

Pass ignore patterns to skip files during scanning and watching:

db = DirSQL(
    root,
    ignore=["**/drafts/**", "**/.git/**"],
    tables=[...],
)

Watching for Changes

DirSQL is async by default. The watch() method returns an async iterator of row-level change events.

import asyncio
import json
from dirsql import DirSQL, Table

async def main():
    db = DirSQL(
        "/path/to/data",
        tables=[
            Table(
                ddl="CREATE TABLE items (name TEXT)",
                glob="**/*.json",
                extract=lambda path, content: [json.loads(content)],
            ),
        ],
    )
    await db.ready()

    # Query
    results = await db.query("SELECT * FROM items")

    # Watch for file changes (insert/update/delete/error events)
    async for event in db.watch():
        print(f"{event.action} on {event.table}: {event.row}")
        if event.action == "error":
            print(f"  error: {event.error}")

asyncio.run(main())

API Reference

Table(*, ddl, glob, extract)

Defines how files map to a SQL table.

  • ddl (str): A CREATE TABLE statement defining the schema.
  • glob (str): A glob pattern matched against file paths relative to root.
  • extract (Callable[[str, str], list[dict]]): A function receiving (relative_path, file_content) and returning a list of row dicts. Each dict's keys must match the DDL column names.

DirSQL(root=None, *, tables=None, ignore=None, config=None)

Creates an in-memory SQLite database indexed from the directory at root. The constructor is sync and returns immediately; scanning runs in a background thread.

At least one of root or config must be supplied. When both root and config are passed (or config declares [dirsql].root), the explicit root wins and a warning is emitted on stderr.

  • root (str | None): Path to the directory to index. Optional when config supplies one.
  • tables (list[Table] | None): Programmatic table definitions. Appended to any tables in the config file.
  • ignore (list[str] | None): Glob patterns for paths to skip. Appended to any [dirsql].ignore patterns in the config file.
  • config (str | None): Optional path to a .dirsql.toml file. Its [[table]] entries, [dirsql].ignore, and optional [dirsql].root are merged into the constructor's inputs.

await DirSQL.ready()

Wait for the initial scan to complete. Idempotent -- safe to call multiple times. Raises any exception that occurred during init.

await DirSQL.query(sql) -> list[dict]

Execute a SQL query. Returns a list of dicts keyed by column name. Internal tracking columns (_dirsql_*) are excluded from results.

DirSQL.watch() -> AsyncIterator[RowEvent]

Returns an async iterator that yields RowEvent objects as files change on disk. Starts the filesystem watcher on first iteration.

RowEvent

Emitted by watch() when a file change produces row-level diffs.

  • table (str): The affected table name.
  • action (str): One of "insert", "update", "delete", "error".
  • row (dict | None): The new row (for insert/update) or deleted row (for delete).
  • old_row (dict | None): The previous row (for update only).
  • error (str | None): Error message (for error events).
  • file_path (str | None): The relative file path that triggered the event.

How It Works

The Rust core (rusqlite + notify + walkdir) does the heavy lifting:

  1. Startup scan: Walks the directory tree, matches files to tables via glob patterns, calls the user-provided extract function for each file, and inserts rows into an in-memory SQLite database.
  2. File watching: Uses the notify crate (inotify on Linux, FSEvents on macOS) to detect file creates, modifications, and deletions.
  3. Row diffing: When a file changes, the new rows are diffed against the previous rows for that file, producing granular insert/update/delete events.
  4. Python bindings: PyO3 exposes the Rust core as a native Python extension module. The async layer runs blocking operations in a thread pool via asyncio.to_thread.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirsql-0.3.4.tar.gz (235.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dirsql-0.3.4-cp312-cp312-win_amd64.whl (2.2 MB view details)

Uploaded CPython 3.12Windows x86-64

dirsql-0.3.4-cp312-cp312-manylinux_2_34_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

dirsql-0.3.4-cp312-cp312-manylinux_2_34_aarch64.whl (2.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ ARM64

dirsql-0.3.4-cp312-cp312-macosx_11_0_arm64.whl (2.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

dirsql-0.3.4-cp312-cp312-macosx_10_12_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

File details

Details for the file dirsql-0.3.4.tar.gz.

File metadata

  • Download URL: dirsql-0.3.4.tar.gz
  • Upload date:
  • Size: 235.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirsql-0.3.4.tar.gz
Algorithm Hash digest
SHA256 4d86b53d88c79859d7a1dfb553cc447ed2fcee67c132f84e2a920fdcb7949007
MD5 51a73a3e15be0ab045bd3c229f73f286
BLAKE2b-256 b558f8e2d83e823c3c6ed9fbae625f5a59601d3708b7024bc41ab76d72ba0f18

See more details on using hashes here.

Provenance

The following attestation bundles were made for dirsql-0.3.4.tar.gz:

Publisher: release.yml on thekevinscott/dirsql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dirsql-0.3.4-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: dirsql-0.3.4-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirsql-0.3.4-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 80ffbb39170f255b3b40bf578260b64ba1350cc397968f380850f51d840673d1
MD5 cae5d0c4d83ec34202528072ab91b994
BLAKE2b-256 2cea6432dfeea8e0841a9090ba4476b6d96285a9122d66f017fde5c30ed1bf15

See more details on using hashes here.

Provenance

The following attestation bundles were made for dirsql-0.3.4-cp312-cp312-win_amd64.whl:

Publisher: release.yml on thekevinscott/dirsql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dirsql-0.3.4-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.3.4-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e812ee1381be27a5176cab043b0257402a36d9e6ad0b4e53d856b1a6aaff8dbd
MD5 691e77803cf0ac6dc788d77fb8e65f3e
BLAKE2b-256 c3d872d8a8d18b06b1c2df240d3d55693b7fb74a3cf6e3c30ae6d6c9460ec30f

See more details on using hashes here.

Provenance

The following attestation bundles were made for dirsql-0.3.4-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on thekevinscott/dirsql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dirsql-0.3.4-cp312-cp312-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for dirsql-0.3.4-cp312-cp312-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 7f058a4f8f870c9d91dbc3a4a1b89fa6c48d602b160f1c7d1d053951e2efff58
MD5 17212bc60596e8e6e7edf7f1998d5e32
BLAKE2b-256 db067f21a24450dfac662ed807c3a2a9940a96a9a727676a9b31ef19ee95a421

See more details on using hashes here.

Provenance

The following attestation bundles were made for dirsql-0.3.4-cp312-cp312-manylinux_2_34_aarch64.whl:

Publisher: release.yml on thekevinscott/dirsql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dirsql-0.3.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dirsql-0.3.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dcaf6c0f1a2aeaf114d42a020f6b849ed48d2b6077863beb100116a0849ec050
MD5 4caa7e450a98075559b7fa234cda5095
BLAKE2b-256 61b3e44eccbed8aa48c54dd72c862faf2b5fb3e8cfa573d72c34ca53616a996a

See more details on using hashes here.

Provenance

The following attestation bundles were made for dirsql-0.3.4-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on thekevinscott/dirsql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dirsql-0.3.4-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.3.4-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3aa7d21ebe442fee8c9a5d7aa72ad4e812a2cdcced919210cd6a1a43cb9f1f8f
MD5 da5aa78f1d4f26e2260ca62f412a8e4d
BLAKE2b-256 425c9c235308865aae81dfa25114ce56683b525baba22086898c980243ad784e

See more details on using hashes here.

Provenance

The following attestation bundles were made for dirsql-0.3.4-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: release.yml on thekevinscott/dirsql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page