Skip to main content

Token-lean agent↔data query interface — curated DuckDB primitives, not raw SQL passthrough.

Project description

lacon

token-lean agent↔data query interface — query files, don't read them

Tests Pre-commit Release PyPI Python License: MIT

Before/AfterInstallWhat You GetPrimitivesHITLPython APICLI Reference


When an agent needs to understand a CSV or Parquet file, the naive approach is to dump the whole file into context. A 5 MB CSV is ~350k tokens. lacon is the opposite: agents query the file and get back only the answer — shaped, capped, and token-costed. DuckDB does the querying under the hood; lacon's value is the interface and the result shaping.

Sibling to datoon (cheaper representation once data is in the prompt) — lacon's lever is to not put the data there at all.

Before / After

Without lacon — dumps file into context

# agent reads the whole file
content = open("sales.csv").read()
# → 350k tokens for a 5 MB file
prompt = f"Analyze this data:\n{content}"

With lacon — gets only the answer

lacon describe sales.csv --pretty
{
  "op": "describe",
  "row_count": 84000,
  "schema": [{"name": "country", "type": "VARCHAR"}, ...],
  "~tokens": 89
}

89 tokens instead of 350k.

Agent wants revenue by country

# without lacon — dumps rows, agent parses
rows = csv.DictReader(open("sales.csv"))
prompt = f"Sum revenue by country from:\n{list(rows)}"
# → thousands of tokens

With lacon — one aggregate call

lacon aggregate sales.csv \
  --group-by country \
  --metrics revenue:sum --pretty
{"op": "aggregate", "rows": [["UA", 4500.25], ...], "~tokens": 62}

Same answer. Tiny fraction of the token cost.

┌─────────────────────────────────────────────────────┐
│  DESCRIBE A 5MB FILE    ████░░░░░░░░   89 tokens    │
│  SAME FILE DUMPED RAW   ████████████  350k tokens   │
│  WRITES ALLOWED         ░░░░░░░░░░░░   0 (read-only)│
└─────────────────────────────────────────────────────┘

[!IMPORTANT] lacon saves tokens by not sending data to the model — only the answer. Savings depend on query type: describe is always cheap; filter/sample returns rows (but capped). Every result includes ~tokens so agents know what they spent.

Install

# core (CLI + DuckDB + SQL validation)
pip install lacon
uv add lacon

# with token estimates (~tokens in every response)
pip install "lacon[tokens]"

# with MCP server (coming in v0.2)
pip install "lacon[mcp]"

Requires Python 3.12+.

Claude Code skill:

claude skill install https://github.com/andrii-su/lacon/releases/latest/download/lacon.skill

Once installed, Claude automatically uses lacon when you mention a .csv, .parquet, or .json file — no manual invocation needed.

What You Get

What
lacon CLI 9 curated primitives for querying data files from terminal or scripts
Python API describe(), aggregate(), filter(), query() — same primitives programmatically
Claude Code Skill /lacon in-session trigger, progressive disclosure workflow, HITL for query
~tokens Every response includes its estimated token cost (requires lacon[tokens])
MCP Server Coming in v0.2 — expose primitives as MCP tools for Claude Desktop, Cursor, Windsurf

Primitives

9 curated operations. Each builds safe SQL internally, enforces read-only, and applies auto-LIMIT.

Primitive Use when the agent asks... Returns
describe "what's in this file?", "what columns?", "how many rows?" schema + metadata, no data rows
sample "show me some rows", "what does the data look like?" first/random N rows
count "how many rows?", "how many where X?" integer
profile "tell me about column X", "any nulls?", "distribution?" per-column stats
distinct "what are the unique values of X?" values list + truncation info
aggregate "total/avg X by Y", "group by Z" grouped rows
filter "show rows where X", "find records matching Y" matching rows, projected columns
find-duplicates "are there duplicates?", "find duplicate X" duplicate groups + counts
query anything the primitives above can't express rows — HITL required

Supported formats: CSV, Parquet, JSON, JSONL — auto-detected from extension.


Quick Start

# always start here
lacon describe data.csv --pretty

# understand a column
lacon profile data.csv --column revenue --pretty

# answer a count question
lacon count data.csv --where "country = 'UA'"

# group by
lacon aggregate data.csv --group-by country --metrics revenue:sum --pretty

# filter with column projection
lacon filter data.csv --where "year = 2024" --columns name revenue --pretty

# unique values
lacon distinct data.csv --column country --pretty

# find duplicates
lacon find-duplicates data.csv --columns email --pretty

# escape hatch — use HITL (see below)
lacon query data.csv "SELECT country, AVG(revenue) FROM {file} GROUP BY country" --show-sql --pretty
lacon query data.csv "SELECT country, AVG(revenue) FROM {file} GROUP BY country" --pretty

Use {file} as the placeholder for the data source in query. Auto-LIMIT is always enforced (default 50, max 1000).


HITL for query

The query primitive runs arbitrary SQL — an escape hatch when curated primitives aren't enough. Because the SQL is not predictable from fixed parameters, always preview before executing.

# Step 1 — preview resolved SQL (no execution)
lacon query sales.csv \
  "SELECT country, name, revenue FROM {file} t \
   WHERE revenue = (SELECT MAX(revenue) FROM {file} t2 WHERE t2.country = t.country) \
   ORDER BY revenue DESC" \
  --show-sql --pretty
{
  "op": "query",
  "sql": "SELECT country, name, revenue FROM read_csv('sales.csv') t WHERE ...",
  "will_execute": false
}
# Step 2 — confirm the SQL, then execute
lacon query sales.csv "..." --pretty

Result always includes the sql field so you can verify what ran:

{
  "op": "query",
  "schema": ["country", "name", "revenue"],
  "rows": [["UA", "Bob", 3400.0], ...],
  "shown": 3,
  "~tokens": 74,
  "sql": "SELECT country, name, revenue FROM read_csv('sales.csv') t WHERE ..."
}

Why: text-to-SQL agents achieve ~80% accuracy on real schemas. A subtly wrong WHERE clause executes silently and looks correct. Showing the SQL before execution catches errors before they produce wrong answers. Curated primitives need no confirmation — their SQL is fully determined by parameters.


Output Envelope

Every response is a shaped JSON object:

{
  "op": "aggregate",
  "schema": ["country", "sum_revenue"],
  "rows": [["UA", 4500.25], ["DE", 2200.0], ["US", 2001.25]],
  "shown": 3,
  "~tokens": 62
}
Field Description
op Which primitive ran
schema Column names — agent never guesses shape
rows Data as arrays — compact, no key repetition
shown How many rows returned (honest truncation)
~tokens Estimated token cost of this response (lacon[tokens])
sql Resolved SQL that ran (query only)

describe returns metadata only (no rows). count returns a single integer.


Python API

from lacon import DuckDBEngine, describe, sample, count, aggregate, filter, query

with DuckDBEngine() as engine:
    # schema + metadata
    r = describe("data.csv", engine)
    print(r["row_count"], r["schema"])

    # column stats
    r = profile("data.csv", column="revenue", engine=engine)
    print(r["min"], r["max"], r["mean"])

    # grouped aggregation
    r = aggregate(
        "data.csv",
        group_by=["country"],
        metrics=[{"col": "revenue", "fn": "sum"}],
        engine=engine,
    )

    # filtered rows with projection
    r = filter("data.csv", where="revenue > 2000", columns=["name", "revenue"], engine=engine)

    # escape hatch — read-only, auto-LIMIT
    r = query("data.csv", "SELECT country, AVG(revenue) FROM {file} GROUP BY country", engine=engine)

Safety

  • Read-only — no writes, no DDL, no COPY, no INSTALL
  • SQL validation — sqlglot parses every query call; rejects INSERT, DROP, CREATE, multi-statement
  • Auto-LIMIT — injected if missing, capped at 1000
  • Path escaping — single quotes in paths escaped before DuckDB

CLI Reference

Global options

Flag Description
--pretty Pretty-print JSON output
--version Print version and exit

describe <path>

Schema, row count, file size. Always start here.

sample <path>

Flag Default Description
--n N 5 Number of rows
--random false Random sample instead of first N

count <path>

Flag Description
--where EXPR Optional WHERE filter

profile <path>

Flag Description
--column COL Column to profile (required)
--top-k N Top-k values for categorical columns (default 10)

aggregate <path>

Flag Description
--group-by COL [COL ...] Group-by columns
--metrics COL:FN [...] Metrics — fn ∈ sum avg min max count
--where EXPR Optional WHERE filter
--limit N Row cap (default 50)

filter <path>

Flag Description
--where EXPR WHERE filter (required)
--columns COL [...] Column projection
--limit N Row cap (default 50)

distinct <path>

Flag Description
--column COL Column to enumerate (required)
--limit N Value cap (default 50)

find-duplicates <path>

Flag Description
--columns COL [...] Key columns (required)
--limit N Group cap (default 50)

query <path> SQL

Flag Description
--limit N Row cap (default 50, max 1000)
--show-sql Preview resolved SQL without executing (HITL)

Use {file} as the data-source placeholder in SQL.


Development

# setup
uv sync --extra dev
uvx pre-commit install

# tests (46 — primitives, safety, shaping)
uv run --with ".[dev,tokens]" pytest

# build lacon.skill for distribution
python scripts/build_skill.py

Links


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lacon-1.1.1.tar.gz (103.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lacon-1.1.1-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file lacon-1.1.1.tar.gz.

File metadata

  • Download URL: lacon-1.1.1.tar.gz
  • Upload date:
  • Size: 103.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lacon-1.1.1.tar.gz
Algorithm Hash digest
SHA256 e3cbad79d1a41fcf99a7070134a517056f1c03cc652031dcab93b43102e164cc
MD5 d4b8eccb5f42c7955c6dc147f4e10ad3
BLAKE2b-256 4d1efb5087e5408210ca5087cf90de8134731799127e732ea06133101586fac6

See more details on using hashes here.

File details

Details for the file lacon-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: lacon-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lacon-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 56dc39ab05ab6f2beeb9cee0b14dcffcc9a76a81e4494e835920794e2bf09d00
MD5 979ede50ff071d3b322da85d80f5e44a
BLAKE2b-256 005320770e5ccc4747f528efbf6894a8925c70b354e3844e5edb16f383d65536

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page