DuckDB-powered tabular data engine for KAOS — register CSV / Parquet / JSON / SQLite, run SQL, export typed results
Project description
kaos-tabular
Part of Kelvin Agentic OS (KAOS) — open agentic infrastructure for legal work, built by 273 Ventures. See the full KAOS package map for the rest of the stack.
kaos-tabular is the SQL-analytics layer of the Kelvin Agentic OS (KAOS),
the agentic infrastructure 273 Ventures builds for legal work. It wraps a
session-scoped, in-process DuckDB connection in a
typed TabularEngine that can register CSV / TSV / Parquet / JSON / JSONL
/ SQLite files and kaos-content TabularDocument instances, run SQL
against them, and export results back to disk. The engine delegates every
DuckDB call to kaos_content.bridges.duckdb — no parser, no executor,
no second copy of the data — and returns typed Table / TabularDocument
results with provenance instead of raw cursors. Seventeen MCP tools
(kaos-tabular-{register, query, list-tables, describe, sample, count, read-file, export, history, find-duplicates, correlation, join, pivot, unpivot, aggregate, filter, top-k}) expose the same surface to agentic
clients. Every error message that mentions a missing table, column, or
aggregate function carries a "Did you mean …?" suggestion so an agent
can self-correct without an extra describe round-trip. The set is
deliberately bounded — Kelvin's predecessor shipped roughly sixty tools
and porting it forward exposed two distinct reasons a typed tool earns
its weight: it removes a syntactic footgun the agent would otherwise
hit (joins with column ambiguity, PIVOT / UNPIVOT, long-form correlation
matrices), or it adds load-bearing validation and structured-event
audit (the GROUP BY / WHERE / ORDER BY trio: agents write the SQL
correctly but cheap typed wrappers catch column typos at the boundary
and emit replayable history events).
The base install is intentionally small: three runtime dependencies
(kaos-content, kaos-core, duckdb) and no compiled native code
beyond the DuckDB wheel, which ships prebuilt for Linux, macOS, and
Windows on x86_64 and arm64. A
single optional extra, [mcp], adds kaos-mcp so kaos-tabular-serve
can stand up a stdio or streamable-HTTP server. There is no [xlsx]
extra at 0.1.0a1 — the previous _register_xlsx path introduced a
sideways kaos-tabular -> kaos-office dependency that the architecture
DAG forbids, and was removed in audit-01 KTAB-002. The supported XLSX
workflow is now: parse the file with kaos_office.parse_xlsx(path) (in
kaos-office, the right home for OPC reading) and pass each returned
Table to engine.register_table(table, name=...) — that method is
already public and unchanged.
Install
uv add kaos-tabular
# or
pip install kaos-tabular
# Optional: stdio + streamable-HTTP MCP server entrypoint
uv add 'kaos-tabular[mcp]'
kaos-tabular requires Python 3.13 or newer (3.14 is supported).
The package is pure Python, classified Operating System :: OS Independent — DuckDB ships native wheels for the major
platforms, so a clean pip install works on Linux, macOS, and Windows
without a compiler toolchain.
Quick start
Open an in-process engine, register a CSV, run SQL, inspect the schema,
and export the result. TabularEngine is a context manager — use with
so the DuckDB connection is closed deterministically:
from kaos_tabular import TabularEngine
with TabularEngine() as engine:
# 1. Register a file. DuckDB infers types, delimiter, and encoding.
name = engine.register_file("orders.csv", table_name="orders")
print("Registered as:", name) # → "orders"
# 2. Run SQL. Results come back as a typed kaos-content Table.
result = engine.execute("SELECT region, SUM(amount) AS total "
"FROM orders GROUP BY region")
print(result.row_count, "rows,", len(result.columns), "columns")
# 3. Inspect the schema (column names, types, nullability, samples).
desc = engine.describe_table("orders")
print(desc["row_count"], "rows in 'orders'")
# 4. Export — public API new in 0.1.0a1 (audit-01 KTAB-003).
engine.export_table("orders", "orders.parquet", format="parquet")
To expose the same surface to MCP clients, register the seventeen tools
on a KaosRuntime and serve them — the easiest path is the [mcp]
extra and the bundled kaos-tabular-serve entry point, but you can
also wire it into an existing FastMCP app:
from kaos_core import KaosRuntime
from kaos_mcp import KaosMCPServer, KaosMCPSettings
from kaos_tabular.tools import register_tabular_tools
runtime = KaosRuntime()
n_tools = register_tabular_tools(runtime) # → 17
server = KaosMCPServer(
runtime=runtime,
settings=KaosMCPSettings(name="kaos-tabular-server", transport="stdio"),
)
server.run_stdio() # for Claude Code / Codex / Gemini
Per-session engines are keyed off KaosContext.session_id inside
tools.py, so concurrent MCP sessions never share a DuckDB connection.
Concepts
The package is a thin, typed surface over DuckDB plus the kaos-content bridges. The most important entries:
| Concept | Purpose |
|---|---|
TabularEngine(db_path=None, read_only=False) |
Session-scoped wrapper around a single DuckDB connection. Context-manager friendly (with TabularEngine() as engine:). db_path selects file-backed persistence; in-memory by default. |
engine.register_file(path, *, table_name=None) |
Register a CSV / TSV / Parquet / JSON / JSONL / SQLite file as a queryable table. Returns the registered name. Multi-table SQLite files are exploded; XLSX is intentionally not handled — see the dependency footprint paragraph. |
engine.register_table(table, *, name=None) |
Register a kaos_content Table (e.g. one returned by kaos_office.parse_xlsx()) as a DuckDB view. Returns the registered name. |
engine.execute(sql, *, max_rows=1000) |
Run arbitrary DuckDB SQL. Wraps the user's SQL as SELECT * FROM (<sql>) AS _q LIMIT N — the hard cap is 10,000 rows. Returns a typed Table. |
engine.describe_table(name) / engine.list_tables() / engine.count(name) / engine.sample(name, n=5) |
Introspection helpers. describe_table returns column metadata, row count, and sample values; sample returns a Table of N random rows. |
engine.export_table(name, path, *, format) |
Public engine method new in 0.1.0a1 (audit-01 KTAB-003). Owns the DuckDB COPY SQL, format mapping (csv / parquet / json), and path quoting that the export tool and CLI used to reach into private internals for. |
engine.find_duplicates(name, *, columns=None) |
Return rows whose values in columns (default: all columns) appear in more than one row. Uses DuckDB QUALIFY COUNT(*) OVER (PARTITION BY …) > 1 so the SQL is one statement. |
engine.correlation(name, *, columns=None) |
Long-form pairwise Pearson correlation matrix — returns (col_a, col_b, corr) rows. Default columns=None auto-selects every numeric column. |
engine.join(left, right, *, on, how="inner", target=None) |
SQL JOIN via DuckDB's USING (col) clause so the join key appears once in the result. how ∈ {inner, left, right, outer, semi, anti, cross}; target materializes via CREATE OR REPLACE TABLE and registers. |
engine.pivot(name, *, on, using, aggregate="sum", group_by=None, target=None) / engine.unpivot(name, *, columns, name_column="variable", value_column="value", target=None) |
Wrap DuckDB's PIVOT / UNPIVOT statements. Pivot accepts aggregate ∈ {sum, avg, min, max, count, first}. Unpivot melts the listed columns into long form. |
engine.aggregate(name, *, aggregates, group_by=None, where=None, having=None, order_by=None, limit=None, target=None) |
Composed GROUP BY with typed validation. aggregates is a list of (func, column[, alias]) tuples; func is one of sum / avg / min / max / count / count_distinct / median / stddev / variance / first / last. The table, every column, and every aggregate function are validated up-front with did-you-mean suggestions on a miss; where and having remain opaque DuckDB SQL fragments (predicate shapes are unbounded). |
engine.filter(name, *, where, limit=None, target=None) |
Typed SELECT * WHERE. The table is validated; where is an opaque DuckDB SQL predicate. Useful when you want a structured-event audit trail (the call shows up in engine.history() as filter:<table>) instead of an opaque query. |
engine.top_k(name, *, by, n=10, ascending=False, target=None) |
ORDER BY ... LIMIT N over one or more columns; defaults to descending so "top N by units" reads naturally. Set ascending=True for bottom-N. |
engine.save(path) / engine.to_tabular_document(name) |
Persist the full database via DuckDB EXPORT DATABASE, or convert one registered table back into a kaos-content TabularDocument (with full row count, even when DuckDB's row stream was truncated). |
| Did-you-mean error messages | Every error mentioning a missing table, column, or aggregate function carries a Did you mean '<closest match>'? suggestion (via difflib.get_close_matches with a 0.6 cutoff). Wired into describe_table, sample, count, find_duplicates, correlation, join, pivot, unpivot, aggregate, filter, top_k, export_table — agents fix typos without an extra describe round-trip. |
Errors (KaosTabularError, EngineError, QueryError, RegistrationError) |
Dedicated exception hierarchy. The MCP layer translates these into ToolResult.create_error() with the documented three-part recovery hint (what / how to fix / alternative tool). |
EngineEvent + engine.history(*, last_n=20) |
Frozen dataclass (timestamp, event_type, detail, table_names) and method returning the recent event log (registers, queries, drops). Provenance trail for an MCP session. |
| The 17 MCP tools | Core 8 — kaos-tabular-{register, query, list-tables, describe, sample, count, read-file, export} — plus the 6 reshape additions: kaos-tabular-{history, find-duplicates, correlation, join, pivot, unpivot} — plus the 3 structured shape tools introduced alongside did-you-mean errors: kaos-tabular-{aggregate, filter, top-k}. Registration paths and pure metadata reads are closed-world; arbitrary-SQL query, file-reading register / read-file, and the destructive export are open-world; all set explicit ToolAnnotations. Register with register_tabular_tools(runtime). |
Trust model — docs/security.md |
DuckDB is in-process; SQL has filesystem access matching the running process. The query tool's description is honest about this; deployments that need stricter isolation should run kaos-tabular in a constrained working directory or container, or use kaos_content.bridges.duckdb.create_safe_connection for an enable_external_access=false connection (which cannot register files). |
CLI
kaos-tabular ships two console scripts. Every structured subcommand on
the admin CLI accepts --json for machine-readable output piped to
other agents:
kaos-tabular --help # admin CLI
kaos-tabular-serve --help # MCP server
kaos-tabular query orders.csv "SELECT region, SUM(amount) FROM orders GROUP BY region"
kaos-tabular describe orders.csv --json # schema + sample values
kaos-tabular sample orders.csv --rows 10 # random rows as markdown
kaos-tabular count orders.csv --table orders # fast row count
kaos-tabular export orders.csv -o orders.parquet # COPY → parquet
kaos-tabular read orders.csv --json # TabularDocument summary
kaos-tabular-serve # stdio (Claude Code / Desktop)
kaos-tabular-serve --http --port 8000 # streamable HTTP
kaos-tabular query opens a fresh in-memory engine, registers the input
file, and runs the SQL — useful for one-shots without standing up a
server. For .duckdb files the engine opens the database directly in
read-only mode. kaos-tabular-serve exposes the fourteen MCP tools listed
in Concepts above; it requires the [mcp] extra.
Compatibility & status
| Aspect | |
|---|---|
| Python | 3.13, 3.14 |
| OS | Linux, macOS, Windows (pure Python — Operating System :: OS Independent; DuckDB ships native wheels for x86_64 and arm64 on all three) |
| Maturity | Alpha (Development Status :: 3 - Alpha). The public API is documented in kaos_tabular.__all__: EngineError, EngineEvent, KaosTabularError, QueryError, RegistrationError, TabularEngine, read_csv, read_json, read_parquet, __version__. |
| Stability policy | Pre-1.0: minor bumps may change behaviour. Every change is documented in CHANGELOG.md. The MCP tool surface (kaos-tabular-* names) and the trust contract documented in docs/security.md are public API and follow the same policy. After 1.0 we follow semver. |
| Test coverage | 276 unit tests (tests/unit/) covering the engine, registration paths, error hierarchy, did-you-mean suggestions, MCP tools, CLI, and serve.py; a 32-test integration suite (tests/integration/) exercising real DuckDB sessions; and a relocated benchmark suite under tests/benchmarks/ for wall-clock regressions. Bounded unit gate: pytest tests/unit -m "not benchmark". Coverage floor enforced via fail_under = 70 in [tool.coverage.report]. |
| Type checker | Validated with ty, Astral's Python type checker. |
Companion packages
kaos-tabular is one of the packages in the
Kelvin Agentic OS. The broader stack:
| Package | Layer | What it does |
|---|---|---|
kaos-core |
Core | Foundational runtime, MCP-native types, registries, execution engine, VFS |
kaos-content |
Core | Typed document AST: Block/Inline, provenance, views |
kaos-mcp |
Bridge | FastMCP server, kaos management CLI, MCP resource templates |
kaos-pdf |
Extraction | PDF → AST with provenance |
kaos-web |
Extraction | Web extraction, browser automation, search, domain intelligence |
kaos-office |
Extraction | DOCX / PPTX / XLSX readers + writers to AST |
kaos-tabular |
Extraction | DuckDB-powered SQL analytics |
kaos-source |
Data | Government + financial data connectors (Federal Register, eCFR, EDGAR, GovInfo, PACER, GLEIF) |
kaos-llm-client |
LLM | Multi-provider LLM transport |
kaos-llm-core |
LLM | Typed LLM programming (Signatures, Programs, Optimizers) |
kaos-nlp-core |
Primitives (Rust) | High-performance NLP primitives |
kaos-nlp-transformers |
ML | Dense embeddings + retrieval |
kaos-graph |
Primitives (Rust) | Graph algorithms + RDF/SPARQL |
kaos-ml-core |
Primitives (Rust) | Classical ML on the document AST |
kaos-citations |
Legal | Legal citation extraction, resolution, verification |
kaos-agents |
Agentic | Agent runtime, memory, recipes |
kaos-reference |
Sample | Reference module for module authors |
Packages depend on kaos-core; everything else is opt-in. Mix and match the
ones you need.
Development
git clone https://github.com/273v/kaos-tabular
cd kaos-tabular
uv sync --group dev
Install pre-commit hooks (recommended — they run the same checks as CI on every commit, scoped to staged files):
uvx pre-commit install
uvx pre-commit run --all-files # one-time full sweep
Manual QA commands (the same set CI runs):
uv run ruff format --check kaos_tabular tests
uv run ruff check kaos_tabular tests
uv run ty check kaos_tabular tests
uv run pytest tests/unit -m "not benchmark" # bounded unit gate
uv run pytest tests/benchmarks # perf regression suite
Build from source
uv build
uv pip install dist/*.whl
python -c "import kaos_tabular; print(kaos_tabular.__version__)" # smoke import
Contributing
Issues and pull requests are welcome. By contributing you certify the
Developer Certificate of Origin v1.1 —
sign every commit with git commit -s. Please open an issue before starting
on a non-trivial change so we can align on scope.
Security
For security issues, please do not file a public issue. Report privately via GitHub Private Vulnerability Reporting or email security@273ventures.com. See SECURITY.md for the full disclosure policy.
License
Apache License 2.0 — see LICENSE and NOTICE.
Copyright 2026 273 Ventures LLC. Built for kelvin.legal.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kaos_tabular-0.1.0a1.tar.gz.
File metadata
- Download URL: kaos_tabular-0.1.0a1.tar.gz
- Upload date:
- Size: 49.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
967043d1a308e9aa5e86a94e3859bf46b860c90a8814eaf1990d3026c239bf2f
|
|
| MD5 |
9a721f11f7fbc427ebe9b1b297e84ec1
|
|
| BLAKE2b-256 |
0a7717162aa9468e6bb5ff4491f7f1b554b9d509959eb2aa2621c109bacc603e
|
Provenance
The following attestation bundles were made for kaos_tabular-0.1.0a1.tar.gz:
Publisher:
release.yml on 273v/kaos-tabular
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_tabular-0.1.0a1.tar.gz -
Subject digest:
967043d1a308e9aa5e86a94e3859bf46b860c90a8814eaf1990d3026c239bf2f - Sigstore transparency entry: 1474384680
- Sigstore integration time:
-
Permalink:
273v/kaos-tabular@a19a832b17907eb7b5d95bad879ebba6c1692fbe -
Branch / Tag:
refs/tags/v0.1.0a1 - Owner: https://github.com/273v
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a19a832b17907eb7b5d95bad879ebba6c1692fbe -
Trigger Event:
push
-
Statement type:
File details
Details for the file kaos_tabular-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: kaos_tabular-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 44.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e27b6dc15305e93e3d94e8884afdce735a320a0af5090aaa7ec549c136f6c09a
|
|
| MD5 |
2a779952db34926fd2109fca50423d3f
|
|
| BLAKE2b-256 |
253f56f289d7f774b547d3f0da7ec7dcbfdc45fa3089c1ef3d436321f69df9c5
|
Provenance
The following attestation bundles were made for kaos_tabular-0.1.0a1-py3-none-any.whl:
Publisher:
release.yml on 273v/kaos-tabular
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_tabular-0.1.0a1-py3-none-any.whl -
Subject digest:
e27b6dc15305e93e3d94e8884afdce735a320a0af5090aaa7ec549c136f6c09a - Sigstore transparency entry: 1474384713
- Sigstore integration time:
-
Permalink:
273v/kaos-tabular@a19a832b17907eb7b5d95bad879ebba6c1692fbe -
Branch / Tag:
refs/tags/v0.1.0a1 - Owner: https://github.com/273v
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a19a832b17907eb7b5d95bad879ebba6c1692fbe -
Trigger Event:
push
-
Statement type: