Ephemeral SQL index over a local directory
Project description
dirsql (Python SDK)
Ephemeral SQL index over a local directory. Watches a filesystem, ingests structured files into an in-memory SQLite database, and exposes a SQL query interface. The database is purely in-memory -- the filesystem is always the source of truth.
Installation
pip install dirsql
Requires Python >= 3.12. Ships as a native extension (Rust via PyO3) -- binary wheels are provided for common platforms.
Each wheel also bundles the dirsql HTTP-server CLI as a console script, so pip install dirsql also gives you a dirsql command on $PATH. See the CLI guide.
Publishing (maintainers)
Handled by .github/workflows/publish.yml (invoked from minor-release.yml / patch-release.yml). For each target triple the build job cargo builds the Rust CLI with --features cli, stages the binary into python/dirsql/_binary/, runs maturin build (which picks the binary up via the [tool.maturin] include rule in pyproject.toml), and the wheels + sdist are then trusted-published to PyPI.
Quick Start
import asyncio
import json
import os
import tempfile
from dirsql import DirSQL, Table
async def main():
# Create some data files
root = tempfile.mkdtemp()
os.makedirs(os.path.join(root, "comments", "abc"), exist_ok=True)
os.makedirs(os.path.join(root, "comments", "def"), exist_ok=True)
with open(os.path.join(root, "comments", "abc", "index.jsonl"), "w") as f:
f.write(json.dumps({"body": "looks good", "author": "alice"}) + "\n")
f.write(json.dumps({"body": "needs work", "author": "bob"}) + "\n")
with open(os.path.join(root, "comments", "def", "index.jsonl"), "w") as f:
f.write(json.dumps({"body": "agreed", "author": "carol"}) + "\n")
# Define a table: DDL, glob pattern, and an extract function
db = DirSQL(
root,
tables=[
Table(
ddl="CREATE TABLE comments (id TEXT, body TEXT, author TEXT)",
glob="comments/**/index.jsonl",
extract=lambda path, content: [
{
"id": os.path.basename(os.path.dirname(path)),
"body": row["body"],
"author": row["author"],
}
for line in content.splitlines()
for row in [json.loads(line)]
],
),
],
)
await db.ready()
# Query with SQL
results = await db.query("SELECT * FROM comments WHERE author = 'alice'")
# [{"id": "abc", "body": "looks good", "author": "alice"}]
asyncio.run(main())
Multiple Tables and Joins
db = DirSQL(
root,
tables=[
Table(
ddl="CREATE TABLE posts (title TEXT, author_id TEXT)",
glob="posts/*.json",
extract=lambda path, content: [json.loads(content)],
),
Table(
ddl="CREATE TABLE authors (id TEXT, name TEXT)",
glob="authors/*.json",
extract=lambda path, content: [json.loads(content)],
),
],
)
await db.ready()
results = await db.query("""
SELECT posts.title, authors.name
FROM posts JOIN authors ON posts.author_id = authors.id
""")
Ignoring Files
Pass ignore patterns to skip files during scanning and watching:
db = DirSQL(
root,
ignore=["**/drafts/**", "**/.git/**"],
tables=[...],
)
Watching for Changes
DirSQL is async by default. The watch() method returns an async iterator of row-level change events.
import asyncio
import json
from dirsql import DirSQL, Table
async def main():
db = DirSQL(
"/path/to/data",
tables=[
Table(
ddl="CREATE TABLE items (name TEXT)",
glob="**/*.json",
extract=lambda path, content: [json.loads(content)],
),
],
)
await db.ready()
# Query
results = await db.query("SELECT * FROM items")
# Watch for file changes (insert/update/delete/error events)
async for event in db.watch():
print(f"{event.action} on {event.table}: {event.row}")
if event.action == "error":
print(f" error: {event.error}")
asyncio.run(main())
API Reference
Table(*, ddl, glob, extract)
Defines how files map to a SQL table.
ddl(str): ACREATE TABLEstatement defining the schema.glob(str): A glob pattern matched against file paths relative to root.extract(Callable[[str, str], list[dict]]): A function receiving(relative_path, file_content)and returning a list of row dicts. Each dict's keys must match the DDL column names.
DirSQL(root=None, *, tables=None, ignore=None, config=None)
Creates an in-memory SQLite database indexed from the directory at root. The constructor is sync and returns immediately; scanning runs in a background thread.
At least one of root or config must be supplied. When both root and config are passed (or config declares [dirsql].root), the explicit root wins and a warning is emitted on stderr.
root(str | None): Path to the directory to index. Optional whenconfigsupplies one.tables(list[Table] | None): Programmatic table definitions. Appended to any tables in the config file.ignore(list[str] | None): Glob patterns for paths to skip. Appended to any[dirsql].ignorepatterns in the config file.config(str | None): Optional path to a.dirsql.tomlfile. Its[[table]]entries,[dirsql].ignore, and optional[dirsql].rootare merged into the constructor's inputs.
await DirSQL.ready()
Wait for the initial scan to complete. Idempotent -- safe to call multiple times. Raises any exception that occurred during init.
await DirSQL.query(sql) -> list[dict]
Execute a SQL query. Returns a list of dicts keyed by column name. Internal tracking columns (_dirsql_*) are excluded from results.
DirSQL.watch() -> AsyncIterator[RowEvent]
Returns an async iterator that yields RowEvent objects as files change on disk. Starts the filesystem watcher on first iteration.
RowEvent
Emitted by watch() when a file change produces row-level diffs.
table(str): The affected table name.action(str): One of"insert","update","delete","error".row(dict | None): The new row (for insert/update) or deleted row (for delete).old_row(dict | None): The previous row (for update only).error(str | None): Error message (for error events).file_path(str | None): The relative file path that triggered the event.
How It Works
The Rust core (rusqlite + notify + walkdir) does the heavy lifting:
- Startup scan: Walks the directory tree, matches files to tables via glob patterns, calls the user-provided
extractfunction for each file, and inserts rows into an in-memory SQLite database. - File watching: Uses the
notifycrate (inotify on Linux, FSEvents on macOS) to detect file creates, modifications, and deletions. - Row diffing: When a file changes, the new rows are diffed against the previous rows for that file, producing granular insert/update/delete events.
- Python bindings: PyO3 exposes the Rust core as a native Python extension module. The async layer runs blocking operations in a thread pool via
asyncio.to_thread.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dirsql-0.3.2.tar.gz.
File metadata
- Download URL: dirsql-0.3.2.tar.gz
- Upload date:
- Size: 227.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5a326c901e34bcd9aacd6c7e065560a0aa04b7b117fc596419ecb8cf67faa68
|
|
| MD5 |
1939046bd9cf8839d0ba13d0bef7afa1
|
|
| BLAKE2b-256 |
0790f962551eb2c9f573bd880deb46fc14497590e382884d35a2b236e68f658c
|
Provenance
The following attestation bundles were made for dirsql-0.3.2.tar.gz:
Publisher:
release.yml on thekevinscott/dirsql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dirsql-0.3.2.tar.gz -
Subject digest:
f5a326c901e34bcd9aacd6c7e065560a0aa04b7b117fc596419ecb8cf67faa68 - Sigstore transparency entry: 1441931371
- Sigstore integration time:
-
Permalink:
thekevinscott/dirsql@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Branch / Tag:
refs/heads/main - Owner: https://github.com/thekevinscott
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Trigger Event:
push
-
Statement type:
File details
Details for the file dirsql-0.3.2-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: dirsql-0.3.2-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 2.2 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6ee4c5cb30a69f48a68c4217fd8ebb3e3edf121d1c1ae75a100641820e2c091
|
|
| MD5 |
ad9d70f80e89500132d46462acb99b48
|
|
| BLAKE2b-256 |
94d0a2e2d06cbff79484f15719d8feaef13f943a75261dd980a82e5839053923
|
Provenance
The following attestation bundles were made for dirsql-0.3.2-cp312-cp312-win_amd64.whl:
Publisher:
release.yml on thekevinscott/dirsql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dirsql-0.3.2-cp312-cp312-win_amd64.whl -
Subject digest:
c6ee4c5cb30a69f48a68c4217fd8ebb3e3edf121d1c1ae75a100641820e2c091 - Sigstore transparency entry: 1441931487
- Sigstore integration time:
-
Permalink:
thekevinscott/dirsql@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Branch / Tag:
refs/heads/main - Owner: https://github.com/thekevinscott
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Trigger Event:
push
-
Statement type:
File details
Details for the file dirsql-0.3.2-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: dirsql-0.3.2-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 2.6 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df0bf4fe0428d2a0681315433b6d4db9e7ecc680e2b3aa03596ae0ee5d30d293
|
|
| MD5 |
58334d87288cfd7240ca101bc4455a5b
|
|
| BLAKE2b-256 |
85ed4401928dace0b47734aef189965186e7d5ea0f33f0de2500b43fa19d5ae0
|
Provenance
The following attestation bundles were made for dirsql-0.3.2-cp312-cp312-manylinux_2_34_x86_64.whl:
Publisher:
release.yml on thekevinscott/dirsql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dirsql-0.3.2-cp312-cp312-manylinux_2_34_x86_64.whl -
Subject digest:
df0bf4fe0428d2a0681315433b6d4db9e7ecc680e2b3aa03596ae0ee5d30d293 - Sigstore transparency entry: 1441932010
- Sigstore integration time:
-
Permalink:
thekevinscott/dirsql@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Branch / Tag:
refs/heads/main - Owner: https://github.com/thekevinscott
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Trigger Event:
push
-
Statement type:
File details
Details for the file dirsql-0.3.2-cp312-cp312-manylinux_2_34_aarch64.whl.
File metadata
- Download URL: dirsql-0.3.2-cp312-cp312-manylinux_2_34_aarch64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ca6464a2a86610f468aaaa12ddbae5fdd581ec79b7e9b9455c068c768673cd9
|
|
| MD5 |
55ea686430c03c276d3bfea9422a2538
|
|
| BLAKE2b-256 |
5e585ba7dbd80030deb4703642377454d6f67f6c9ada28bf513f83500ab9b5f9
|
Provenance
The following attestation bundles were made for dirsql-0.3.2-cp312-cp312-manylinux_2_34_aarch64.whl:
Publisher:
release.yml on thekevinscott/dirsql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dirsql-0.3.2-cp312-cp312-manylinux_2_34_aarch64.whl -
Subject digest:
4ca6464a2a86610f468aaaa12ddbae5fdd581ec79b7e9b9455c068c768673cd9 - Sigstore transparency entry: 1441931735
- Sigstore integration time:
-
Permalink:
thekevinscott/dirsql@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Branch / Tag:
refs/heads/main - Owner: https://github.com/thekevinscott
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Trigger Event:
push
-
Statement type:
File details
Details for the file dirsql-0.3.2-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: dirsql-0.3.2-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.2 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce53cfd1377a13243f088e2a46928115045c6e9fcc0e798342f6d85dcc3f2fdd
|
|
| MD5 |
12da14d9a4c1be9ac0b52fa7422920fe
|
|
| BLAKE2b-256 |
91158601c4daabcb31e20f9139422a5e1b13e97b287c3bc032232391151b8c47
|
Provenance
The following attestation bundles were made for dirsql-0.3.2-cp312-cp312-macosx_11_0_arm64.whl:
Publisher:
release.yml on thekevinscott/dirsql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dirsql-0.3.2-cp312-cp312-macosx_11_0_arm64.whl -
Subject digest:
ce53cfd1377a13243f088e2a46928115045c6e9fcc0e798342f6d85dcc3f2fdd - Sigstore transparency entry: 1441931610
- Sigstore integration time:
-
Permalink:
thekevinscott/dirsql@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Branch / Tag:
refs/heads/main - Owner: https://github.com/thekevinscott
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Trigger Event:
push
-
Statement type:
File details
Details for the file dirsql-0.3.2-cp312-cp312-macosx_10_12_x86_64.whl.
File metadata
- Download URL: dirsql-0.3.2-cp312-cp312-macosx_10_12_x86_64.whl
- Upload date:
- Size: 2.4 MB
- Tags: CPython 3.12, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45841da254070446085253cba06538bd5db5be296ceea09d74375306282f13ea
|
|
| MD5 |
9b0428797d3b516b439985cedfdd5bb0
|
|
| BLAKE2b-256 |
642a23af19bc345ccb3e89dd3d89466d8896b17263489e204128bfdbe708a6e6
|
Provenance
The following attestation bundles were made for dirsql-0.3.2-cp312-cp312-macosx_10_12_x86_64.whl:
Publisher:
release.yml on thekevinscott/dirsql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dirsql-0.3.2-cp312-cp312-macosx_10_12_x86_64.whl -
Subject digest:
45841da254070446085253cba06538bd5db5be296ceea09d74375306282f13ea - Sigstore transparency entry: 1441931883
- Sigstore integration time:
-
Permalink:
thekevinscott/dirsql@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Branch / Tag:
refs/heads/main - Owner: https://github.com/thekevinscott
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@47f2cde46c42761a71f1210b5c54a31ba8166cde -
Trigger Event:
push
-
Statement type: