Skip to main content

Ephemeral SQL index over a local directory

Project description

dirsql (Python SDK)

Ephemeral SQL index over a local directory. Watches a filesystem, ingests structured files into an in-memory SQLite database, and exposes a SQL query interface. The database is purely in-memory -- the filesystem is always the source of truth.

Documentation

Installation

pip install dirsql

Requires Python >= 3.12. Ships as a native extension (Rust via PyO3) -- binary wheels are provided for common platforms.

Each wheel also bundles the dirsql HTTP-server CLI as a console script, so pip install dirsql also gives you a dirsql command on $PATH. See the CLI guide.

Publishing (maintainers)

Handled by .github/workflows/publish.yml (invoked from minor-release.yml / patch-release.yml). For each target triple the build job cargo builds the Rust CLI with --features cli, stages the binary into python/dirsql/_binary/, runs maturin build (which picks the binary up via the [tool.maturin] include rule in pyproject.toml), and the wheels + sdist are then trusted-published to PyPI.

Quick Start

import asyncio
import json
import os
import tempfile
from dirsql import DirSQL, Table

async def main():
    # Create some data files
    root = tempfile.mkdtemp()
    os.makedirs(os.path.join(root, "comments", "abc"), exist_ok=True)
    os.makedirs(os.path.join(root, "comments", "def"), exist_ok=True)

    with open(os.path.join(root, "comments", "abc", "index.jsonl"), "w") as f:
        f.write(json.dumps({"body": "looks good", "author": "alice"}) + "\n")
        f.write(json.dumps({"body": "needs work", "author": "bob"}) + "\n")

    with open(os.path.join(root, "comments", "def", "index.jsonl"), "w") as f:
        f.write(json.dumps({"body": "agreed", "author": "carol"}) + "\n")

    # Define a table: DDL, glob pattern, and an extract function
    db = DirSQL(
        root,
        tables=[
            Table(
                ddl="CREATE TABLE comments (id TEXT, body TEXT, author TEXT)",
                glob="comments/**/index.jsonl",
                extract=lambda path, content: [
                    {
                        "id": os.path.basename(os.path.dirname(path)),
                        "body": row["body"],
                        "author": row["author"],
                    }
                    for line in content.splitlines()
                    for row in [json.loads(line)]
                ],
            ),
        ],
    )
    await db.ready()

    # Query with SQL
    results = await db.query("SELECT * FROM comments WHERE author = 'alice'")
    # [{"id": "abc", "body": "looks good", "author": "alice"}]

asyncio.run(main())

Multiple Tables and Joins

db = DirSQL(
    root,
    tables=[
        Table(
            ddl="CREATE TABLE posts (title TEXT, author_id TEXT)",
            glob="posts/*.json",
            extract=lambda path, content: [json.loads(content)],
        ),
        Table(
            ddl="CREATE TABLE authors (id TEXT, name TEXT)",
            glob="authors/*.json",
            extract=lambda path, content: [json.loads(content)],
        ),
    ],
)
await db.ready()

results = await db.query("""
    SELECT posts.title, authors.name
    FROM posts JOIN authors ON posts.author_id = authors.id
""")

Ignoring Files

Pass ignore patterns to skip files during scanning and watching:

db = DirSQL(
    root,
    ignore=["**/drafts/**", "**/.git/**"],
    tables=[...],
)

Watching for Changes

DirSQL is async by default. The watch() method returns an async iterator of row-level change events.

import asyncio
import json
from dirsql import DirSQL, Table

async def main():
    db = DirSQL(
        "/path/to/data",
        tables=[
            Table(
                ddl="CREATE TABLE items (name TEXT)",
                glob="**/*.json",
                extract=lambda path, content: [json.loads(content)],
            ),
        ],
    )
    await db.ready()

    # Query
    results = await db.query("SELECT * FROM items")

    # Watch for file changes (insert/update/delete/error events)
    async for event in db.watch():
        print(f"{event.action} on {event.table}: {event.row}")
        if event.action == "error":
            print(f"  error: {event.error}")

asyncio.run(main())

API Reference

Table(*, ddl, glob, extract)

Defines how files map to a SQL table.

  • ddl (str): A CREATE TABLE statement defining the schema.
  • glob (str): A glob pattern matched against file paths relative to root.
  • extract (Callable[[str, str], list[dict]]): A function receiving (relative_path, file_content) and returning a list of row dicts. Each dict's keys must match the DDL column names.

DirSQL(root, *, tables, ignore=None)

Creates an in-memory SQLite database indexed from the directory at root. The constructor is sync and returns immediately; scanning runs in a background thread.

  • root (str): Path to the directory to index.
  • tables (list[Table]): Table definitions.
  • ignore (list[str] | None): Glob patterns for paths to skip.

await DirSQL.ready()

Wait for the initial scan to complete. Idempotent -- safe to call multiple times. Raises any exception that occurred during init.

await DirSQL.query(sql) -> list[dict]

Execute a SQL query. Returns a list of dicts keyed by column name. Internal tracking columns (_dirsql_*) are excluded from results.

DirSQL.watch() -> AsyncIterator[RowEvent]

Returns an async iterator that yields RowEvent objects as files change on disk. Starts the filesystem watcher on first iteration.

DirSQL.from_config(path) -> DirSQL

Create a DirSQL instance from a .dirsql.toml config file. Returns immediately; scanning runs in the background. Call await db.ready() before querying.

RowEvent

Emitted by watch() when a file change produces row-level diffs.

  • table (str): The affected table name.
  • action (str): One of "insert", "update", "delete", "error".
  • row (dict | None): The new row (for insert/update) or deleted row (for delete).
  • old_row (dict | None): The previous row (for update only).
  • error (str | None): Error message (for error events).
  • file_path (str | None): The relative file path that triggered the event.

How It Works

The Rust core (rusqlite + notify + walkdir) does the heavy lifting:

  1. Startup scan: Walks the directory tree, matches files to tables via glob patterns, calls the user-provided extract function for each file, and inserts rows into an in-memory SQLite database.
  2. File watching: Uses the notify crate (inotify on Linux, FSEvents on macOS) to detect file creates, modifications, and deletions.
  3. Row diffing: When a file changes, the new rows are diffed against the previous rows for that file, producing granular insert/update/delete events.
  4. Python bindings: PyO3 exposes the Rust core as a native Python extension module. The async layer runs blocking operations in a thread pool via asyncio.to_thread.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirsql-0.2.0.tar.gz (103.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dirsql-0.2.0-cp313-cp313-win_amd64.whl (5.1 MB view details)

Uploaded CPython 3.13Windows x86-64

dirsql-0.2.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

dirsql-0.2.0-cp313-cp313-macosx_11_0_arm64.whl (5.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

dirsql-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl (5.7 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

dirsql-0.2.0-cp312-cp312-win_amd64.whl (5.1 MB view details)

Uploaded CPython 3.12Windows x86-64

dirsql-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

dirsql-0.2.0-cp312-cp312-macosx_11_0_arm64.whl (5.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

dirsql-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl (5.7 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

dirsql-0.2.0-cp311-cp311-win_amd64.whl (5.1 MB view details)

Uploaded CPython 3.11Windows x86-64

dirsql-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

dirsql-0.2.0-cp311-cp311-macosx_11_0_arm64.whl (5.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

dirsql-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl (5.7 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

dirsql-0.2.0-cp310-cp310-win_amd64.whl (5.1 MB view details)

Uploaded CPython 3.10Windows x86-64

dirsql-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

dirsql-0.2.0-cp310-cp310-macosx_11_0_arm64.whl (5.5 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

dirsql-0.2.0-cp310-cp310-macosx_10_12_x86_64.whl (5.7 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file dirsql-0.2.0.tar.gz.

File metadata

  • Download URL: dirsql-0.2.0.tar.gz
  • Upload date:
  • Size: 103.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirsql-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0397326c487c9d659c8b5ae0015defde2b2276e1b0dae3771296b69d92c8ea3b
MD5 97cafc7968bc8a4f7aac75451b6d17b6
BLAKE2b-256 d74d2898bc9d2e066620dcc89ba348a43fcc4bb3d10298ee63ba0b45d11fa791

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: dirsql-0.2.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 5.1 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirsql-0.2.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 c304b18d2d5549d6ce4f880950289cb4616793414654d0159df08330b2cacc8a
MD5 3597a6cee8965dcfe4f4f5138f1bb903
BLAKE2b-256 56a0be22d42df8fbd3d8d23766a38028a637406e94f1f1c582dc3632a6a95b83

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1922ee626bf7a3fa6f2f8035f785abd55e429a4c090596e09dccc230835a4f07
MD5 38ced85267f3150c18c3367f3f7d89b0
BLAKE2b-256 8ac2d9b7e85f85d610861ccb53039d827e9242254c38c2bd36d6e57a31aaab2f

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b107a014238725df44e2080774b80387309404d74d847fb7a09e949cd39137f6
MD5 b82578d7ffd45f1b8f82a71f6066a907
BLAKE2b-256 dcbe5ee36d16d9669a1e4f1947fc517f0d067b5717125bc8875f53cd5f30600a

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 776d08a8bdfbf73a9c89aeca18c1ec7a2a203132803f072056b3b6132ab6f36f
MD5 0a941691a81720abdead0876828375aa
BLAKE2b-256 84c9edc58f61a5f8f3f1d0e3a4bfe2f2a0a2300edc93949765a323292b834e4c

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: dirsql-0.2.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 5.1 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirsql-0.2.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 715c5f343782faebf5edb81a697d6f8f1f01db1a69fd5c5895e8cbbad05f2f9b
MD5 72f801c13dac12825e5def72b9e510f6
BLAKE2b-256 c446f68c134850849712c931b17866240bbb6148567c68d7dc9d998aad0c09b2

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4317a701f7f1b1f67d51348238375b48a69fc686576d477fc7cdb39d0794ede4
MD5 b2ed26c82e0352aa0c6daf4b00f0ca9d
BLAKE2b-256 10882365f75607624937b03c499422e78ed7b59a3bdc19a8c0b574b9c3bc17c5

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ed524da58d3caee06fed565194908fce2bc3bfaa0e88f2c2a4a2f2833d8ba873
MD5 fff6296063a637f291d7eb1268ad50e1
BLAKE2b-256 4fb802413cbc8fb52473202da2af5aa2b06e3c0c7cbaacbde2b1eb547b94dafb

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d8f4b27442c394823c6bcad1520aae4fc1ea1b093a3a6a286e5bb0cee20cf8f6
MD5 6fb3d56057c3418095edfd6ca326c52a
BLAKE2b-256 e0305d6a10deec9d42d8941adfa12a285e817d6d7b50362ae91250966c294378

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: dirsql-0.2.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 5.1 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirsql-0.2.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 01c7456c65374c4dfc81cb97ba4d84d7134f152b92f6a415dd4d996981b923e6
MD5 6c3ce973c727fb3debb361804520a1b4
BLAKE2b-256 a3dbe410006d9ca699606134863c429473684a1416f6bf2f33f08d1cd47f29fa

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b16fb3e53cc9534e68f83438eeb94eba0b11b28fe5aa32670227564f0390cbf7
MD5 a3a63e5fc5fe44e2aca7816722c71c49
BLAKE2b-256 ef60c3372149e54fcbb4390e0da912be7b9dd4f20e4ab379293c976542b2af5c

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8be7a659b22540c59a44f72b0adaa9414661d1427ea26ce418baddf4305f6f85
MD5 226e3fb49f74f818eaec2f8d41412fd9
BLAKE2b-256 e1bf9a1d36963a4976fc208af8a5682b59d01ad5bfb9856c87d089385129aa9e

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7390e7e0847142143e73325d42017852c5df4c517487025d6f3b123cba2f252b
MD5 18986a9969f1c30f52494a50775f60ab
BLAKE2b-256 6867f34db0a0a360b5f66b4ad4654cbbe72163f4ac873066a6311b574771d5da

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: dirsql-0.2.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 5.1 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirsql-0.2.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b6c00bd8aebd796ce88e2edb22774daf80ea0c406919c11b4d8e63a3aac1cc61
MD5 b9b97e5ef006d427b462d406eb036559
BLAKE2b-256 9a3d7625d5a07db58030ec8a4dbf2a3e52e13444770b8312039201b9fdcda85e

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 09fccddf880f21c219ce7b34ed719b01a3a0c8ae99c46d73f7516e4eb70e656f
MD5 1a05a4f27390d1c81ab21c0d24555e68
BLAKE2b-256 721cece4dd5395d015c8c55e526274ca47d6f8259b976d3f5bfe32164bf036fb

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 123caa7545e7dfc819270f8e36768e45b269637aff06b06df4656f8198153542
MD5 f84f6762a64e97247c3cb2d5573ec2e9
BLAKE2b-256 979124a83cea6a412fb252faa5b5dedb87cd46aaa5d4eef2f04f1c58dab5cc80

See more details on using hashes here.

File details

Details for the file dirsql-0.2.0-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.2.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e97198f65993403483aa99e50cee8ef03c27927279c7b24df5601bf1c191cee3
MD5 0ec8e5fcd22c6c64c1aa048c584bfce8
BLAKE2b-256 3606ad19bdbfa721a74942b8e8e0b8aecf73f2c1480b6a2c33c3af48fc4f2fdf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page