Skip to main content

Ephemeral SQL index over a local directory

Project description

dirsql (Python SDK)

Ephemeral SQL index over a local directory. Watches a filesystem, ingests structured files into an in-memory SQLite database, and exposes a SQL query interface. The database is purely in-memory -- the filesystem is always the source of truth.

Installation

pip install dirsql

Requires Python >= 3.12. Ships as a native extension (Rust via PyO3) -- binary wheels are provided for common platforms.

Quick Start

import asyncio
import json
import os
import tempfile
from dirsql import DirSQL, Table

async def main():
    # Create some data files
    root = tempfile.mkdtemp()
    os.makedirs(os.path.join(root, "comments", "abc"), exist_ok=True)
    os.makedirs(os.path.join(root, "comments", "def"), exist_ok=True)

    with open(os.path.join(root, "comments", "abc", "index.jsonl"), "w") as f:
        f.write(json.dumps({"body": "looks good", "author": "alice"}) + "\n")
        f.write(json.dumps({"body": "needs work", "author": "bob"}) + "\n")

    with open(os.path.join(root, "comments", "def", "index.jsonl"), "w") as f:
        f.write(json.dumps({"body": "agreed", "author": "carol"}) + "\n")

    # Define a table: DDL, glob pattern, and an extract function
    db = DirSQL(
        root,
        tables=[
            Table(
                ddl="CREATE TABLE comments (id TEXT, body TEXT, author TEXT)",
                glob="comments/**/index.jsonl",
                extract=lambda path, content: [
                    {
                        "id": os.path.basename(os.path.dirname(path)),
                        "body": row["body"],
                        "author": row["author"],
                    }
                    for line in content.splitlines()
                    for row in [json.loads(line)]
                ],
            ),
        ],
    )
    await db.ready()

    # Query with SQL
    results = await db.query("SELECT * FROM comments WHERE author = 'alice'")
    # [{"id": "abc", "body": "looks good", "author": "alice"}]

asyncio.run(main())

Multiple Tables and Joins

db = DirSQL(
    root,
    tables=[
        Table(
            ddl="CREATE TABLE posts (title TEXT, author_id TEXT)",
            glob="posts/*.json",
            extract=lambda path, content: [json.loads(content)],
        ),
        Table(
            ddl="CREATE TABLE authors (id TEXT, name TEXT)",
            glob="authors/*.json",
            extract=lambda path, content: [json.loads(content)],
        ),
    ],
)
await db.ready()

results = await db.query("""
    SELECT posts.title, authors.name
    FROM posts JOIN authors ON posts.author_id = authors.id
""")

Ignoring Files

Pass ignore patterns to skip files during scanning and watching:

db = DirSQL(
    root,
    ignore=["**/drafts/**", "**/.git/**"],
    tables=[...],
)

Watching for Changes

DirSQL is async by default. The watch() method returns an async iterator of row-level change events.

import asyncio
import json
from dirsql import DirSQL, Table

async def main():
    db = DirSQL(
        "/path/to/data",
        tables=[
            Table(
                ddl="CREATE TABLE items (name TEXT)",
                glob="**/*.json",
                extract=lambda path, content: [json.loads(content)],
            ),
        ],
    )
    await db.ready()

    # Query
    results = await db.query("SELECT * FROM items")

    # Watch for file changes (insert/update/delete/error events)
    async for event in db.watch():
        print(f"{event.action} on {event.table}: {event.row}")
        if event.action == "error":
            print(f"  error: {event.error}")

asyncio.run(main())

API Reference

Table(*, ddl, glob, extract)

Defines how files map to a SQL table.

  • ddl (str): A CREATE TABLE statement defining the schema.
  • glob (str): A glob pattern matched against file paths relative to root.
  • extract (Callable[[str, str], list[dict]]): A function receiving (relative_path, file_content) and returning a list of row dicts. Each dict's keys must match the DDL column names.

DirSQL(root, *, tables, ignore=None)

Creates an in-memory SQLite database indexed from the directory at root. The constructor is sync and returns immediately; scanning runs in a background thread.

  • root (str): Path to the directory to index.
  • tables (list[Table]): Table definitions.
  • ignore (list[str] | None): Glob patterns for paths to skip.

await DirSQL.ready()

Wait for the initial scan to complete. Idempotent -- safe to call multiple times. Raises any exception that occurred during init.

await DirSQL.query(sql) -> list[dict]

Execute a SQL query. Returns a list of dicts keyed by column name. Internal tracking columns (_dirsql_*) are excluded from results.

DirSQL.watch() -> AsyncIterator[RowEvent]

Returns an async iterator that yields RowEvent objects as files change on disk. Starts the filesystem watcher on first iteration.

DirSQL.from_config(path) -> DirSQL

Create a DirSQL instance from a .dirsql.toml config file. Returns immediately; scanning runs in the background. Call await db.ready() before querying.

RowEvent

Emitted by watch() when a file change produces row-level diffs.

  • table (str): The affected table name.
  • action (str): One of "insert", "update", "delete", "error".
  • row (dict | None): The new row (for insert/update) or deleted row (for delete).
  • old_row (dict | None): The previous row (for update only).
  • error (str | None): Error message (for error events).
  • file_path (str | None): The relative file path that triggered the event.

How It Works

The Rust core (rusqlite + notify + walkdir) does the heavy lifting:

  1. Startup scan: Walks the directory tree, matches files to tables via glob patterns, calls the user-provided extract function for each file, and inserts rows into an in-memory SQLite database.
  2. File watching: Uses the notify crate (inotify on Linux, FSEvents on macOS) to detect file creates, modifications, and deletions.
  3. Row diffing: When a file changes, the new rows are diffed against the previous rows for that file, producing granular insert/update/delete events.
  4. Python bindings: PyO3 exposes the Rust core as a native Python extension module. The async layer runs blocking operations in a thread pool via asyncio.to_thread.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirsql-0.0.25.tar.gz (73.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dirsql-0.0.25-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

dirsql-0.0.25-cp313-cp313-macosx_11_0_arm64.whl (2.4 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

dirsql-0.0.25-cp313-cp313-macosx_10_12_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

dirsql-0.0.25-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

dirsql-0.0.25-cp312-cp312-macosx_11_0_arm64.whl (2.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

dirsql-0.0.25-cp312-cp312-macosx_10_12_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

dirsql-0.0.25-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

dirsql-0.0.25-cp311-cp311-macosx_11_0_arm64.whl (2.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

dirsql-0.0.25-cp311-cp311-macosx_10_12_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

dirsql-0.0.25-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

dirsql-0.0.25-cp310-cp310-macosx_11_0_arm64.whl (2.4 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

dirsql-0.0.25-cp310-cp310-macosx_10_12_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

dirsql-0.0.25-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

dirsql-0.0.25-cp39-cp39-macosx_11_0_arm64.whl (2.4 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

dirsql-0.0.25-cp39-cp39-macosx_10_12_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.9macOS 10.12+ x86-64

File details

Details for the file dirsql-0.0.25.tar.gz.

File metadata

  • Download URL: dirsql-0.0.25.tar.gz
  • Upload date:
  • Size: 73.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirsql-0.0.25.tar.gz
Algorithm Hash digest
SHA256 32ae360510ecd3cd94c50b231cc4cc8702723b0dd49cd5bff20951706d5be31f
MD5 e149a9a4925d30e3556fea9df00a67c4
BLAKE2b-256 dabf9e1bc26f0f8d74192f3943222dad7638efec394c4e1fd9ebfd55eab98964

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 42605e1cfcedde7b6b85178f9ea9903a128224494ca1ddc2cf257071756fa370
MD5 4a28f660064e0bdfabd690a9b356215b
BLAKE2b-256 52ee922635b651e9e100d8b5e1d172e62c2bdcc0439216a9ff0a81e3686b6d14

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a496f30d8f81fa7374a83767e05dc4a25f0050e2542b0329f604dfc8e4c28e6c
MD5 132ee14e82ba77060ff1a2327fecb206
BLAKE2b-256 fe2a5882f62100ced5fba03be2ac687c33789fb8bf048463cf9d0c1af779e793

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3d87b3fb083e55d7cc10c9ed9f313e97a5780068c8bf8ef0d910838441b3af76
MD5 e6e31b8e158e788c21bf80a65ed3f03e
BLAKE2b-256 4b8ee0e9f9798d6a2b6b431b75934c4dbf8d85516404cdb7eeb7c97df50cb555

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0369d0f55fb59558155b04770db4e47499672a1108253999a17959cfaed19b53
MD5 4b0fbcf74540d2f0400ba72f6f648ea0
BLAKE2b-256 15ce582119e5b5987374cc29febbcc7d9580a26186e144a820594baaaa0a60db

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 49038227441bccc7a2eedcb5f29337ec82a78cb969c0253d8c6a62d8fe208988
MD5 6a33bbc10f6bf6785ba1ee56a3f048d0
BLAKE2b-256 3a68fb9a8a575e3818721044a88dc2e6c5bbf0855384b7164cc21bf4d84a31d7

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cdae2433471a74f753ccff78b71e7adc9d8574013bff5b46fb1c6e2655dad135
MD5 dc846fce099f522e076860653e2488b6
BLAKE2b-256 c528100ad778be0376baae03413fe4837d8aac7e8c91ce2ff92c8c4615a12a3a

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fcc32e5de2da74f05b638e7006c29121d1d2ef3dda72fe9f606b82539b048a64
MD5 dd60a0b526a4f5d6ba434f2e69953e58
BLAKE2b-256 26415fe3d3cdc87e9898d7857f04988af78c1f60950c6c59d1284f50dcff83ad

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 76c9ba08d30303858332003e2e1de0a58ecb8429296c9e7ec126a3135b1bfd4b
MD5 250f717783ba6a0f4db220a94a78f2b6
BLAKE2b-256 06f816a54cc29d0bbbabb2e5cac768bcec68b6a7b6bc495afbf04ba10cf98602

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 640955704008dcf84461b2377f4e4b5f62b0c7ff5a6b0279c4f16dcfc83721e3
MD5 ea7180f5ddc82c418b59cf77fec78468
BLAKE2b-256 e695d1342d05276c87a108a7f9f43b404a40a4af22928fc93f38199dd673f986

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e80cc2512d6644383f7271640244c7896592885cc03e748d9489e62efe5531f
MD5 058cf8c2695e10cb51c68625923746fd
BLAKE2b-256 8bb4fd4e6f74d2a6dad4297cf34413302eac513798b684a48a9f4576117f372a

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8bb6b43bf5b4f7dabf5df6764ecb3c1fa4d6e44fd9274e05b46c7ffffa31f407
MD5 804d541e0419709c2d1d9d9996ac6a77
BLAKE2b-256 d4266e2c8c7b0bd45a4c906ef4640135fb97671e5678becb13025421e2255073

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b0f63e55648a21225a0c0faec2c9fc9f31607535d37fb796d661423e6b37406e
MD5 2c85e7d2a3dfde3b7046592ae6a31699
BLAKE2b-256 af62f59b25108d73af43b8763aad81e05b3935f8a0aa6fa31bcd4d2d21e129af

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a213da9f5fa7c7d9f7418218708a38676dcf4fda89594a863b70f933b03a6c73
MD5 16ad1635762098daca116c23708d2ad3
BLAKE2b-256 92b093081d63a1ff5d73b55f0f1fccabdc0bfb280cc5b54828e9ef61ac666e53

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 434401298caec18d960248ff41bb13b1fac21311431f675adf48d16bc33ecc02
MD5 218afd35fc60fe31f3a89d72971413e6
BLAKE2b-256 bc63ba75d74d75a026b651c1e30e0a6bed7819a55df4c86ff4553eaffb409261

See more details on using hashes here.

File details

Details for the file dirsql-0.0.25-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dirsql-0.0.25-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 edfe607979dab9c9d90fbbcf61cb3db7bc3b7cecee1fbebd23fd0a65fe784296
MD5 33ac57b533543db7fcfb2d639540b93e
BLAKE2b-256 7c6b45e62fac669f48286da1282a45affa4d23021e7227255c55256cfe4299d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page