Token-efficient MCP server for tabular data retrieval via CSV/Excel/Parquet/JSONL indexing

These details have not been verified by PyPI

Project links

Project description

FREE FOR PERSONAL USE

Use it to make money, and Uncle J. gets a taste. Fair enough? details

Cut spreadsheet token usage by 99.997%

Most AI agents explore tabular data the expensive way:

dump the whole file into the prompt → skim a million irrelevant rows → repeat.

That is not "a little inefficient." That is a token incinerator.

A 255 MB CSV file with 1 million rows costs 111 million tokens if you paste it raw. A single describe_dataset call answers the same orientation question in 3,849 tokens.

That is a 25,333× reduction — measured, not estimated, on a real 1M-row public dataset.

jDataMunch indexes the file once and lets agents retrieve only the exact data they need: column profiles, filtered rows, and server-side aggregations — with SQL precision.

Benchmark: LAPD crime records — 1,004,894 rows, 28 columns, 255 MB Baseline (raw file): 111,028,360 tokens | jDataMunch: ~3,849 tokens | 25,333× reduction Methodology & harness · Full results

Task	Traditional approach	With jDataMunch
Understand a dataset	Paste entire CSV	`describe_dataset` → column names, types, cardinality, samples
Find relevant columns	Read every row	`search_data` → column-level results with IDs
Answer a filtered question	Load millions of rows	`get_rows` with structured filters → only matching rows
Compute a group-by	Return all data	`aggregate` → server-side SQL, one result set

Index once. Query cheaply. Keep moving. Precision retrieval beats brute-force context.

jDataMunch MCP

Structured tabular data retrieval for AI agents

License MCP Local-first SQLite jMRI

Commercial licenses

jDataMunch-MCP is free for non-commercial use.

Commercial use requires a paid license.

jDataMunch-only licenses

Builder — $39 — 1 developer

Studio — $149 — up to 5 developers

Platform — $799 — org-wide internal deployment

Want the full jMunch suite?

Munch Trio Builder Bundle — $99

Munch Trio Studio Bundle — $449

Munch Trio Platform Bundle — $2,499

Stop paying your model to read the whole damn spreadsheet.

jDataMunch turns tabular data exploration into structured retrieval.

Instead of forcing an agent to load an entire CSV, scan millions of rows, and burn through context just to find the right column name, jDataMunch lets it navigate by what the data is and retrieve only what matters.

That means:

25,333× lower data-reading token usage on a 1M-row CSV (measured)
less irrelevant context polluting the prompt
faster dataset orientation — one call tells you everything about the schema
accurate filtered queries — the agent asks for Hollywood assaults, it gets Hollywood assaults
server-side aggregations — GROUP BY runs in SQLite, not inside the context window

It indexes your files once using a streaming parser and SQLite, stores column profiles and row data with proper type affinity, and retrieves exactly what the agent asked for instead of re-loading the entire file on every question.

Why agents need this

Most agents still handle spreadsheets like someone who prints the entire internet before reading one article:

paste the whole CSV to answer a narrow question
re-load the same file repeatedly across tool calls
consume column headers, empty cells, malformed rows, and irrelevant records
burn context window on data that was never part of the question

jDataMunch fixes that by giving them a structured way to:

describe a dataset's schema before touching any row data
search for the specific column that holds the answer
retrieve only the rows that match the filter
run aggregations server-side and get back a single result set
orient themselves with samples before committing to a full query

Agents do not need bigger context windows.

They need better aim.

What you get

Column-level retrieval

Understand a dataset's full schema — types, cardinality, null rates, value distributions, samples — in a single sub-10ms call. No rows loaded.

Filtered row retrieval

Structured filters with 10 operators (eq, neq, gt, gte, lt, lte, contains, in, is_null, between). All parameterized SQL — no injection surface. Hard cap of 500 rows per call to protect context budgets.

Server-side aggregations

GROUP BY with count, sum, avg, min, max, count_distinct, median. The computation stays in SQLite. One compact result set comes back instead of the data the model would aggregate itself.

Smart column search

search_data searches column names, value indexes, and AI summaries simultaneously. Ask for "weapon type" and get Weapon Used Cd back. Ask for "Hollywood" and get the column whose values contain it.

Token savings telemetry

Every call reports tokens_saved and cost_avoided estimates. get_session_stats shows your cumulative savings across the session.

Local-first speed

Indexes are stored at ~/.data-index/ by default. No cloud. No API keys required for core functionality.

How it works

jDataMunch parses local CSV and Excel files using a streaming, single-pass pipeline:

CSV/Excel file
  → Streaming parser (never loads full file into memory)
  → Column profiler (type inference, cardinality, min/max/mean/median, value indexes)
  → SQLite writer (10,000-row batches, WAL mode, indexes on low-cardinality columns)
  → index.json (column profiles, stats, file hash for incremental detection)

When an agent queries:

describe_dataset  →  reads index.json in memory (< 10ms)
get_rows          →  parameterized SQL on data.sqlite (< 100ms on indexed columns)
aggregate         →  GROUP BY SQL on data.sqlite (< 200ms for simple group-by)
search_data       →  scans column profiles in memory (< 50ms)

No raw file is ever re-read after the initial index. The SQLite database serves all row-level queries.

For a 255 MB, 1,004,894-row CSV (measured on real data):

Index time: ~43 seconds (one-time)
describe_dataset: 35 ms, 3,849 tokens vs 111,028,360 tokens raw — 25,333×
describe_column (single column deep-dive): 22–33 ms, ~600 tokens
get_rows (indexed filter): < 100 ms
Peak indexing memory: < 500 MB

Start fast

1. Install it

pip install jdatamunch-mcp

For Excel (.xlsx) support:

pip install "jdatamunch-mcp[excel]"

2. Add it to your MCP client

Claude Code (one command)

claude mcp add jdatamunch uvx jdatamunch-mcp

Restart Claude Code. Confirm with /mcp.

Claude Desktop

Add to your config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows):

{
  "mcpServers": {
    "jdatamunch": {
      "command": "uvx",
      "args": ["jdatamunch-mcp"]
    }
  }
}

OpenClaw

Option A — CLI:

openclaw mcp set jdatamunch '{"command":"uvx","args":["jdatamunch-mcp"]}'

Option B — Edit ~/.openclaw/openclaw.json:

{
  "mcpServers": {
    "jdatamunch": {
      "command": "uvx",
      "args": ["jdatamunch-mcp"],
      "transport": "stdio"
    }
  }
}

Restart the gateway: openclaw gateway restart. Verify: openclaw mcp list.

Other clients (Cursor, Windsurf, Roo, etc.)

Any MCP-compatible client accepts the same JSON block in its MCP config file.

3. Index a file and start querying

index_local(path="/path/to/data.csv", name="my-dataset")
describe_dataset(dataset="my-dataset")
get_rows(dataset="my-dataset", filters=[{"column": "City", "op": "eq", "value": "Los Angeles"}], limit=10)

4. Tell your agent to actually use it

Installing jDataMunch makes the tools available. It does not guarantee the agent will stop pasting entire CSVs into prompts unless you tell it to use structured retrieval first.

Claude Code / Claude Desktop

Add this to your CLAUDE.md (global or project-level):

## Data Exploration Policy
Use jdatamunch-mcp for tabular data whenever available.
Always call describe_dataset first to understand the schema.
Use get_rows with filters rather than loading raw files.
Use aggregate for any group-by or summary questions.

OpenClaw

Add the same policy to your agent's system prompt file (e.g. ~/.openclaw/agents/analyst.md), then reference it in ~/.openclaw/openclaw.json:

{
  "agents": {
    "named": {
      "analyst": {
        "systemPromptFile": "~/.openclaw/agents/analyst.md"
      }
    }
  }
}

Check your token savings

Ask your agent: "How many tokens has jDataMunch saved me?"

The agent will call get_session_stats, which returns session and lifetime token savings with per-model cost breakdowns. Lifetime stats persist to ~/.data-index/session_stats.json across sessions.

Tools

Tool	What it does
`index_local`	Index a CSV or Excel file. Profiles columns, loads rows into SQLite. Incremental by default (skips if file unchanged).
`list_datasets`	List all indexed datasets with row counts, column counts, and file sizes.
`describe_dataset`	Full schema profile: every column's name, type, cardinality, null%, and sample values. Primary orientation tool.
`describe_column`	Deep profile of one column: full value distribution, histogram bins, temporal range.
`search_data`	Search column names and values by keyword. Returns column IDs — tells the agent where to look, not the data.
`get_rows`	Filtered row retrieval with 10 operators. Parameterized SQL. 500-row hard cap.
`aggregate`	Server-side GROUP BY: count, sum, avg, min, max, count_distinct, median.
`sample_rows`	Head, tail, or random sample. Good for first-look at an unfamiliar dataset.
`get_schema_drift`	Compare schema (columns, types, nullability) between two indexed datasets. Detects breaking vs additive changes.
`get_data_hotspots`	Rank columns by data-quality risk: null rate, cardinality anomalies, numeric outlier spread. First-look triage.
`get_session_stats`	Cumulative token savings and cost avoided across the session.

Filter operators

get_rows and aggregate accept structured filters:

{"column": "AREA NAME",    "op": "eq",      "value": "Hollywood"}
{"column": "Vict Age",     "op": "between", "value": [25, 35]}
{"column": "Crm Cd Desc",  "op": "contains","value": "ASSAULT"}
{"column": "Weapon Used Cd","op": "is_null","value": true}
{"column": "AREA",         "op": "in",      "value": [1, 2, 7]}

Operator	Meaning
`eq`	equals
`neq`	not equals
`gt`, `gte`	greater than (or equal)
`lt`, `lte`	less than (or equal)
`contains`	case-insensitive substring
`in`	value in list
`is_null`	null / not null check
`between`	inclusive range `[min, max]`

Multiple filters are ANDed. No raw SQL accepted — injection surface is zero.

Configuration

Variable	Default	Purpose
`DATA_INDEX_PATH`	`~/.data-index/`	Index storage location
`JDATAMUNCH_MAX_ROWS`	`5,000,000`	Row cap for indexing
`JDATAMUNCH_SHARE_SAVINGS`	`1`	Set `0` to disable anonymous token savings telemetry
`ANTHROPIC_API_KEY`	—	AI column summaries via Claude (v1.1+)
`GOOGLE_API_KEY`	—	AI column summaries via Gemini (v1.1+)

When does it help?

Scenario	Without jDataMunch	With jDataMunch	Measured savings
Orient on a 255 MB CSV	Paste raw file → 111M tokens	`describe_dataset` → 3,849 tokens	25,333×
Schema + column deep-dive	Same 111M tokens	`describe_dataset` + `describe_column` → ~4,400 tokens	~25,000×
Find the crime-type column	Scan headers manually	`search_data("crime type")` → column ID	structural
Get Hollywood assault rows	Load all 1M rows	`get_rows` with 2 filters → matching rows only	~99%+
Crime count by area	Return all rows, aggregate in LLM	`aggregate(group_by=["AREA NAME"])` → 21 rows	~99.9%
Understand weapon nulls	Load column, count manually	`describe_column("Weapon Used Cd")` → `null_pct: 64.2%`	~99.9%
Re-query an unchanged file	Re-load file every time	Hash check → instant skip if unchanged	100% of re-read cost

The case where it doesn't help: you genuinely need every row for ML training or full exports. For that, read the file directly. For everything else — exploration, filtering, aggregation, orientation — structured retrieval wins every time.

ID scheme

Every column and row gets a stable ID:

{dataset}::{column_name}#column     →  "lapd-crime::AREA NAME#column"
{dataset}::row_{rowid}#row          →  "lapd-crime::row_4421#row"
{dataset}::{pk_col}={value}#row     →  "lapd-crime::DR_NO=211507896#row"

Pass column IDs directly to describe_column. Row IDs are returned in get_rows results.

Part of the jMunch family

Product	Domain	Unit of retrieval	PyPI
jcodemunch-mcp	Source code	Symbols (functions, classes)	`jcodemunch-mcp`
jdocmunch-mcp	Documentation	Sections (headings)	`jdocmunch-mcp`
jdatamunch-mcp	Tabular data	Columns, row slices, aggregations	`jdatamunch-mcp`

All three implement jMRI — the open retrieval interface spec. Same response envelope, same token tracking, same telemetry pattern.

Best for

analysts, finance, ops, and consultants working with large spreadsheets
AI agents that answer questions about CSV or Excel data
anyone paying token costs to load files they query repeatedly
teams that want structured, auditable data access instead of raw file dumps
developers building data-aware agents who need a drop-in retrieval layer

New here?

Index a file, run describe_dataset, and look at what comes back.

That single call — 35 milliseconds, 3,849 tokens — tells you everything that would have cost you 111 million tokens to read raw.

That's the whole idea...

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.13.0

May 14, 2026

1.12.2

May 13, 2026

1.12.1

May 13, 2026

1.12.0

May 12, 2026

1.11.0

May 12, 2026

1.10.0

May 12, 2026

1.9.0

May 12, 2026

1.8.0

May 12, 2026

1.7.0

May 12, 2026

1.6.0

May 12, 2026

1.5.0

May 12, 2026

1.4.0

Apr 28, 2026

1.1.0

Apr 28, 2026

1.0.0

Apr 28, 2026

0.8.4

Apr 15, 2026

0.8.3

Apr 9, 2026

0.8.2

Apr 8, 2026

0.8.1

Apr 8, 2026

0.8.0

Apr 8, 2026

0.7.1

Apr 8, 2026

0.7.0

Apr 8, 2026

This version

0.6.0

Apr 8, 2026

0.5.0

Apr 8, 2026

0.4.0

Apr 8, 2026

0.3.0

Apr 2, 2026

0.2.1

Mar 31, 2026

0.2.0

Mar 26, 2026

0.1.3

Mar 26, 2026

0.1.2

Mar 26, 2026

0.1.1

Mar 26, 2026

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jdatamunch_mcp-0.6.0.tar.gz (70.9 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jdatamunch_mcp-0.6.0-py3-none-any.whl (66.5 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file jdatamunch_mcp-0.6.0.tar.gz.

File metadata

Download URL: jdatamunch_mcp-0.6.0.tar.gz
Upload date: Apr 8, 2026
Size: 70.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for jdatamunch_mcp-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`08e203df272ffa30bda9f86a56bdddac65e4c20087b09d3421ec3bd3e3829cf5`
MD5	`b085ed8315ee8dbb7556590eec9cb21c`
BLAKE2b-256	`ae89f49d3c4158c15b1174405eb778c54d00f983c5b42910ee5990503385c123`

See more details on using hashes here.

File details

Details for the file jdatamunch_mcp-0.6.0-py3-none-any.whl.

File metadata

Download URL: jdatamunch_mcp-0.6.0-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 66.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for jdatamunch_mcp-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca85941a8279ec32387d27e870a2fc8a104da671a160bf2050eebbb68564aeec`
MD5	`38d372d60ae093c012a15f59a1ade0d4`
BLAKE2b-256	`40735f7507b5fcb4a1bb69b811f1c6f4efb79cf121012a263066ff7f1c75c9d4`

See more details on using hashes here.

jdatamunch-mcp 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

FREE FOR PERSONAL USE

Cut spreadsheet token usage by 99.997%

jDataMunch MCP

Structured tabular data retrieval for AI agents

Commercial licenses

Why agents need this

What you get

Column-level retrieval

Filtered row retrieval

Server-side aggregations

Smart column search

Token savings telemetry

Local-first speed

How it works

Start fast

1. Install it

2. Add it to your MCP client

Claude Code (one command)

Claude Desktop

OpenClaw

Other clients (Cursor, Windsurf, Roo, etc.)

3. Index a file and start querying

4. Tell your agent to actually use it

Claude Code / Claude Desktop

OpenClaw

Check your token savings

Tools

Filter operators

Configuration

When does it help?

ID scheme

Part of the jMunch family

Best for

New here?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes