Token-lean agent↔data query interface — curated DuckDB primitives, not raw SQL passthrough.
Project description
lacon
Token-lean data query interface for agents. Query CSV, Parquet, and JSON files via curated DuckDB primitives — instead of dumping files into context — and get back only the answer.
lacon describe sales.csv --pretty
# → schema, row count, file size. Zero data rows.
lacon aggregate sales.csv --group-by country --metrics revenue:sum --pretty
# → {"op": "aggregate", "schema": [...], "rows": [...], "shown": 3, "~tokens": 62}
Not another DuckDB-over-MCP
SQL-passthrough DuckDB MCP servers already exist. Lacon is the opposite bet:
- Curated primitives —
describe/sample/profile/aggregate/filter/distinct/find-duplicates/query. An agent can't get them syntactically wrong. - Progressive disclosure — cheap-first:
describe→sample→ targeted primitive. Agents rarely needSELECT *. - Guardrails baked in — read-only, auto-LIMIT (max 1000), SQL validated via sqlglot.
- Token-shaped output — every response includes
~tokensso the agent knows what it costs. - HITL for
query— preview SQL before executing via--show-sql. Curated primitives need no confirmation.
North star: minimize what data costs an LLM.
Sibling to datoon (cheaper representation in-prompt) — Lacon's lever is to not send the data at all.
Install
pip install lacon # CLI + DuckDB + sqlglot
pip install lacon[tokens] # + tiktoken for ~tokens estimates
Or from source:
git clone https://github.com/andrii-su/lacon
cd lacon
uv run lacon --version
Claude Code skill
claude skill install https://github.com/andrii-su/lacon/releases/latest/download/lacon.skill
Once installed, Claude automatically uses lacon when you mention a data file — no manual invocation needed. Raw cat/Read on .csv/.parquet/.json files is replaced by lacon describe → curated query.
Primitives
| Command | What it answers |
|---|---|
describe |
schema, row count, file size — always start here |
sample |
first / random N rows |
count |
row count with optional WHERE |
profile |
per-column stats: nulls, distinct, min/max/mean or top-k |
distinct |
unique values for a column |
aggregate |
GROUP BY with sum / avg / min / max / count |
filter |
rows matching WHERE, with column projection |
find-duplicates |
duplicate groups + counts |
query |
escape hatch — arbitrary read-only SQL, HITL required |
All commands accept --pretty for human-readable output.
Quick examples
# What's in the file?
lacon describe data.csv --pretty
# First 5 rows
lacon sample data.csv --n 5 --pretty
# How many orders from Ukraine?
lacon count orders.csv --where "country = 'Ukraine'"
# Revenue by country
lacon aggregate sales.csv --group-by country --metrics revenue:sum --pretty
# Duplicate emails?
lacon find-duplicates users.csv --columns email --pretty
# Column stats
lacon profile users.csv --column age --pretty
# Rows matching filter, projected columns
lacon filter sales.csv --where "revenue > 5000" --columns name country revenue --pretty
# Custom SQL — HITL: preview first, then execute
lacon query sales.csv "SELECT year, SUM(revenue) FROM {file} GROUP BY year" --show-sql --pretty
lacon query sales.csv "SELECT year, SUM(revenue) FROM {file} GROUP BY year" --pretty
Output envelope
Every response is a shaped JSON object:
{
"op": "aggregate",
"schema": ["country", "sum_revenue"],
"rows": [["UA", 4500.25], ["US", 2001.25], ["DE", 2200.0]],
"shown": 3,
"~tokens": 62
}
schema— always present, agent never guesses shapeshown— how many rows returned (honest truncation)~tokens— estimated token cost of this response (requireslacon[tokens])queryresults also includesql— what actually ran
Human-in-the-loop for query
The query escape hatch runs arbitrary SQL. Before executing, preview:
# Step 1 — see what will run
lacon query data.csv "SELECT country, AVG(revenue) FROM {file} GROUP BY country" --show-sql --pretty
# → {"op": "query", "sql": "SELECT ... FROM read_csv('data.csv') ... LIMIT 50", "will_execute": false}
# Step 2 — confirm, then execute
lacon query data.csv "SELECT country, AVG(revenue) FROM {file} GROUP BY country" --pretty
Curated primitives (describe, filter, etc.) need no confirmation — their SQL is fully determined by the parameters.
Safety
- Read-only — no writes, no DDL, no COPY, no INSTALL
- SQL validation — sqlglot parses every
querycall; rejects non-SELECT statements - Auto-LIMIT — injected if missing, capped at 1000
- Path escaping — single quotes in paths are escaped before passing to DuckDB
Stack
Python 3.12+, DuckDB, sqlglot, tiktoken (optional).
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lacon-1.0.2.tar.gz.
File metadata
- Download URL: lacon-1.0.2.tar.gz
- Upload date:
- Size: 96.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fa112b89e2c97adf9da2177497ffcdadb96c0b0007bb86a769d079396e5486b
|
|
| MD5 |
309693b6b721d6b6a3d2484286a48ac2
|
|
| BLAKE2b-256 |
a8a7ebd9973d4ca6c77c1b0ebcf2c8f53ecf8480ff79810b818a3c6d09b2c8a0
|
File details
Details for the file lacon-1.0.2-py3-none-any.whl.
File metadata
- Download URL: lacon-1.0.2-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5291a287740fed87cb3ba7396100f6dcbd1d78510140642268072982d2ba721
|
|
| MD5 |
b27cfdae2b3ebcb836e3bf80fc0d7995
|
|
| BLAKE2b-256 |
72948e8e16de987bbd9f2a60933fff5116bad0e1dfe107f9fd8cd26b7bf10ae2
|