Token-lean agent↔data query interface — curated DuckDB primitives, not raw SQL passthrough.
Project description
lacon
token-lean agent↔data query interface — query files, don't read them
Before/After • Install • What You Get • Primitives • HITL • Python API • CLI Reference
When an agent needs to understand a CSV or Parquet file, the naive approach is to dump the whole file into context. A 5 MB CSV is ~350k tokens. lacon is the opposite: agents query the file and get back only the answer — shaped, capped, and token-costed. DuckDB does the querying under the hood; lacon's value is the interface and the result shaping.
Sibling to datoon (cheaper representation once data is in the prompt) — lacon's lever is to not put the data there at all.
Before / After
Same answer. Tiny fraction of the token cost.
┌─────────────────────────────────────────────────────┐
│ DESCRIBE A 5MB FILE ████░░░░░░░░ 89 tokens │
│ SAME FILE DUMPED RAW ████████████ 350k tokens │
│ WRITES ALLOWED ░░░░░░░░░░░░ 0 (read-only)│
└─────────────────────────────────────────────────────┘
[!IMPORTANT] lacon saves tokens by not sending data to the model — only the answer. Savings depend on query type:
describeis always cheap;filter/samplereturns rows (but capped). Every result includes~tokensso agents know what they spent.
Install
# core (CLI + DuckDB + SQL validation)
pip install lacon
uv add lacon
# with token estimates (~tokens in every response)
pip install "lacon[tokens]"
# with MCP server (coming in v0.2)
pip install "lacon[mcp]"
Requires Python 3.12+.
Claude Code skill:
claude skill install https://github.com/andrii-su/lacon/releases/latest/download/lacon.skill
Once installed, Claude automatically uses lacon when you mention a .csv, .parquet, or .json file — no manual invocation needed.
What You Get
| What | |
|---|---|
lacon CLI |
9 curated primitives for querying data files from terminal or scripts |
| Python API | describe(), aggregate(), filter(), query() — same primitives programmatically |
| Claude Code Skill | /lacon in-session trigger, progressive disclosure workflow, HITL for query |
| ~tokens | Every response includes its estimated token cost (requires lacon[tokens]) |
| MCP Server | Coming in v0.2 — expose primitives as MCP tools for Claude Desktop, Cursor, Windsurf |
Primitives
9 curated operations. Each builds safe SQL internally, enforces read-only, and applies auto-LIMIT.
| Primitive | Use when the agent asks... | Returns |
|---|---|---|
describe |
"what's in this file?", "what columns?", "how many rows?" | schema + metadata, no data rows |
sample |
"show me some rows", "what does the data look like?" | first/random N rows |
count |
"how many rows?", "how many where X?" | integer |
profile |
"tell me about column X", "any nulls?", "distribution?" | per-column stats |
distinct |
"what are the unique values of X?" | values list + truncation info |
aggregate |
"total/avg X by Y", "group by Z" | grouped rows |
filter |
"show rows where X", "find records matching Y" | matching rows, projected columns |
find-duplicates |
"are there duplicates?", "find duplicate X" | duplicate groups + counts |
query |
anything the primitives above can't express | rows — HITL required |
Supported formats: CSV, Parquet, JSON, JSONL — auto-detected from extension.
Quick Start
# always start here
lacon describe data.csv --pretty
# understand a column
lacon profile data.csv --column revenue --pretty
# answer a count question
lacon count data.csv --where "country = 'UA'"
# group by
lacon aggregate data.csv --group-by country --metrics revenue:sum --pretty
# filter with column projection
lacon filter data.csv --where "year = 2024" --columns name revenue --pretty
# unique values
lacon distinct data.csv --column country --pretty
# find duplicates
lacon find-duplicates data.csv --columns email --pretty
# escape hatch — use HITL (see below)
lacon query data.csv "SELECT country, AVG(revenue) FROM {file} GROUP BY country" --show-sql --pretty
lacon query data.csv "SELECT country, AVG(revenue) FROM {file} GROUP BY country" --pretty
Use {file} as the placeholder for the data source in query. Auto-LIMIT is always enforced (default 50, max 1000).
HITL for query
The query primitive runs arbitrary SQL — an escape hatch when curated primitives aren't enough. Because the SQL is not predictable from fixed parameters, always preview before executing.
# Step 1 — preview resolved SQL (no execution)
lacon query sales.csv \
"SELECT country, name, revenue FROM {file} t \
WHERE revenue = (SELECT MAX(revenue) FROM {file} t2 WHERE t2.country = t.country) \
ORDER BY revenue DESC" \
--show-sql --pretty
{
"op": "query",
"sql": "SELECT country, name, revenue FROM read_csv('sales.csv') t WHERE ...",
"will_execute": false
}
# Step 2 — confirm the SQL, then execute
lacon query sales.csv "..." --pretty
Result always includes the sql field so you can verify what ran:
{
"op": "query",
"schema": ["country", "name", "revenue"],
"rows": [["UA", "Bob", 3400.0], ...],
"shown": 3,
"~tokens": 74,
"sql": "SELECT country, name, revenue FROM read_csv('sales.csv') t WHERE ..."
}
Why: text-to-SQL agents achieve ~80% accuracy on real schemas. A subtly wrong WHERE clause executes silently and looks correct. Showing the SQL before execution catches errors before they produce wrong answers. Curated primitives need no confirmation — their SQL is fully determined by parameters.
Output Envelope
Every response is a shaped JSON object:
{
"op": "aggregate",
"schema": ["country", "sum_revenue"],
"rows": [["UA", 4500.25], ["DE", 2200.0], ["US", 2001.25]],
"shown": 3,
"~tokens": 62
}
| Field | Description |
|---|---|
op |
Which primitive ran |
schema |
Column names — agent never guesses shape |
rows |
Data as arrays — compact, no key repetition |
shown |
How many rows returned (honest truncation) |
~tokens |
Estimated token cost of this response (lacon[tokens]) |
sql |
Resolved SQL that ran (query only) |
describe returns metadata only (no rows). count returns a single integer.
Python API
from lacon import DuckDBEngine, describe, sample, count, aggregate, filter, query
with DuckDBEngine() as engine:
# schema + metadata
r = describe("data.csv", engine)
print(r["row_count"], r["schema"])
# column stats
r = profile("data.csv", column="revenue", engine=engine)
print(r["min"], r["max"], r["mean"])
# grouped aggregation
r = aggregate(
"data.csv",
group_by=["country"],
metrics=[{"col": "revenue", "fn": "sum"}],
engine=engine,
)
# filtered rows with projection
r = filter("data.csv", where="revenue > 2000", columns=["name", "revenue"], engine=engine)
# escape hatch — read-only, auto-LIMIT
r = query("data.csv", "SELECT country, AVG(revenue) FROM {file} GROUP BY country", engine=engine)
Safety
- Read-only — no writes, no DDL, no COPY, no INSTALL
- SQL validation — sqlglot parses every
querycall; rejects INSERT, DROP, CREATE, multi-statement - Auto-LIMIT — injected if missing, capped at 1000
- Path escaping — single quotes in paths escaped before DuckDB
CLI Reference
Global options
| Flag | Description |
|---|---|
--pretty |
Pretty-print JSON output |
--version |
Print version and exit |
describe <path>
Schema, row count, file size. Always start here.
sample <path>
| Flag | Default | Description |
|---|---|---|
--n N |
5 | Number of rows |
--random |
false | Random sample instead of first N |
count <path>
| Flag | Description |
|---|---|
--where EXPR |
Optional WHERE filter |
profile <path>
| Flag | Description |
|---|---|
--column COL |
Column to profile (required) |
--top-k N |
Top-k values for categorical columns (default 10) |
aggregate <path>
| Flag | Description |
|---|---|
--group-by COL [COL ...] |
Group-by columns |
--metrics COL:FN [...] |
Metrics — fn ∈ sum avg min max count |
--where EXPR |
Optional WHERE filter |
--limit N |
Row cap (default 50) |
filter <path>
| Flag | Description |
|---|---|
--where EXPR |
WHERE filter (required) |
--columns COL [...] |
Column projection |
--limit N |
Row cap (default 50) |
distinct <path>
| Flag | Description |
|---|---|
--column COL |
Column to enumerate (required) |
--limit N |
Value cap (default 50) |
find-duplicates <path>
| Flag | Description |
|---|---|
--columns COL [...] |
Key columns (required) |
--limit N |
Group cap (default 50) |
query <path> SQL
| Flag | Description |
|---|---|
--limit N |
Row cap (default 50, max 1000) |
--show-sql |
Preview resolved SQL without executing (HITL) |
Use {file} as the data-source placeholder in SQL.
Development
# setup
uv sync --extra dev
uvx pre-commit install
# tests (46 — primitives, safety, shaping)
uv run --with ".[dev,tokens]" pytest
# build lacon.skill for distribution
python scripts/build_skill.py
Links
- CONTRIBUTING.md — contributor workflow
- CLAUDE.md — agent/maintainer context
- CHANGELOG.md — release history
- SECURITY.md — vulnerability reporting
- Live docs —
docs/ - Issues — bugs, features, questions
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lacon-1.1.1.tar.gz.
File metadata
- Download URL: lacon-1.1.1.tar.gz
- Upload date:
- Size: 103.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3cbad79d1a41fcf99a7070134a517056f1c03cc652031dcab93b43102e164cc
|
|
| MD5 |
d4b8eccb5f42c7955c6dc147f4e10ad3
|
|
| BLAKE2b-256 |
4d1efb5087e5408210ca5087cf90de8134731799127e732ea06133101586fac6
|
File details
Details for the file lacon-1.1.1-py3-none-any.whl.
File metadata
- Download URL: lacon-1.1.1-py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56dc39ab05ab6f2beeb9cee0b14dcffcc9a76a81e4494e835920794e2bf09d00
|
|
| MD5 |
979ede50ff071d3b322da85d80f5e44a
|
|
| BLAKE2b-256 |
005320770e5ccc4747f528efbf6894a8925c70b354e3844e5edb16f383d65536
|