Skip to main content

XLSX file CLI built with Agent Experience (AX) in mind.

Project description

agent-xlsx

PyPI Python License

XLSX file CLI built with Agent Experience (AX) in mind.

agent-xlsx gives LLM agents the same depth of understanding of Excel workbooks that a human gets by opening them in Excel — structure, data, formatting, charts, formulas, VBA, and visual layout — all accessible through a single CLI that returns token-efficient JSON.

# Profile an entire workbook in <10ms
agent-xlsx probe report.xlsx

# Read any range as structured JSON
agent-xlsx read report.xlsx "Sales!A1:F50"

# Search across all sheets
agent-xlsx search report.xlsx "revenue" --ignore-case

# Full-fidelity visual capture (charts, shapes, conditional formatting)
agent-xlsx screenshot report.xlsx

Why agent-xlsx?

LLM agents working with Excel files face a fundamental problem: existing libraries are designed for humans writing Python scripts, not for agents that need to build understanding of a workbook incrementally and efficiently.

agent-xlsx solves this with three design principles:

  1. Progressive Disclosureprobe (structure) → screenshot (visual) → read (data) → inspect (metadata). Each layer adds detail only when needed. No wasted tokens.

  2. Speed First — The primary data backend is Polars + fastexcel (Rust/Calamine), delivering 7-10x faster reads than openpyxl with zero-copy Arrow integration. A full workbook profile completes in under 50ms.

  3. Token Efficiency — Every output is optimised for minimal token consumption. Aggregation over enumeration. Capped lists with counts. An agent builds comprehensive understanding of a workbook in 1-2 round-trips, not 10.


Installation

CLI

uv add agent-xlsx     # or: pip install agent-xlsx

Agent Skill

Give AI agents built-in knowledge of agent-xlsx commands and workflows:

npx skills add apetta/agent-xlsx

Compatible with Claude Code, Cursor, Gemini CLI, and 20+ other agents.

Optional: Aspose.Cells (cross-platform rendering, no Excel/LibreOffice needed)

Note: Aspose.Cells for Python is a proprietary, commercially licensed library, not covered by this project's Apache-2.0 licence. Users who install it are subject to Aspose's EULA.

# Adds screenshot, recalc, and objects support on any platform
uv add --optional aspose aspose-cells-python

Without a licence, Aspose runs in evaluation mode (watermarks on rendered images, 100-file-per-session limit). Set a licence via:

agent-xlsx license --set /path/to/Aspose.Cells.lic
# Or: export ASPOSE_LICENSE_PATH=/path/to/Aspose.Cells.lic

Optional: LibreOffice (free fallback for screenshot and recalc)

# macOS
brew install --cask libreoffice

# Ubuntu / Debian / ECS
apt install libreoffice-calc

# Alpine
apk add libreoffice-calc

All other commands (probe, read, search, export, write, format, inspect, overview, sheet, vba) work with zero system dependencies.


Quick Start

The recommended agent workflow is probe first, then drill down:

# 1. Profile the workbook — lean skeleton in <10ms
agent-xlsx probe workbook.xlsx

# 2. Drill into types / samples if needed
agent-xlsx probe workbook.xlsx --types --sample 3

# 3. Visual understanding — see formatting, charts, layout
agent-xlsx screenshot workbook.xlsx

# 4. Read specific data
agent-xlsx read workbook.xlsx --sheet Sales "A1:F100"

# 5. Inspect metadata — formulas, charts, merged cells, conditional formatting
agent-xlsx inspect workbook.xlsx --sheet Sales

Commands

probe — Ultra-Fast Workbook Profiling

The first command an agent should run. Lean by default — returns sheet names, dimensions, and headers with zero data parsing (<10ms). Use flags to opt into richer detail.

agent-xlsx probe data.xlsx                    # Lean: sheet names, dims, headers only
agent-xlsx probe data.xlsx --types            # Add column types + null counts
agent-xlsx probe data.xlsx --sample 3         # Add 3 head + 3 tail rows
agent-xlsx probe data.xlsx --stats            # Full stats (implies --types)
agent-xlsx probe data.xlsx --full             # Everything: types + sample(3) + stats
agent-xlsx probe data.xlsx --sheet "Sales"    # Single sheet

Default output (~250 tokens for 6 sheets):

{
  "file": "data.xlsx",
  "size_bytes": 107679,
  "format": "xlsx",
  "probe_time_ms": 7.9,
  "sheets": [
    {
      "name": "txns",
      "index": 0,
      "visible": true,
      "rows": 255,
      "cols": 34,
      "headers": ["user_id", "txn_day", "txn_month", "amount", "currency", "..."]
    }
  ]
}

With --full (types + sample + stats):

{
  "file": "data.xlsx",
  "size_bytes": 107679,
  "format": "xlsx",
  "probe_time_ms": 18.5,
  "sheets": [
    {
      "name": "txns",
      "index": 0,
      "visible": true,
      "rows": 255,
      "cols": 34,
      "headers": ["user_id", "txn_day", "txn_month", "amount", "currency", "..."],
      "column_types": {
        "user_id": "string",
        "txn_day": "float64",
        "amount": "float64",
        "txn_date": "datetime",
        "category": "string"
      },
      "null_counts": {"user_id": 0, "amount": 0, "currency": 0},
      "sample": {
        "head": [["8bb055ad-...", 1, 12, -39.0, "GBP"]],
        "tail": [["8bb055ad-...", 1, 8, -150.0, "GBP"]]
      },
      "numeric_summary": {
        "amount": {"min": -4888.06, "max": 5000.0, "mean": -142.3, "std": 892.1}
      },
      "string_summary": {
        "category": {"unique": 12, "top_values": ["Software & Technology", "Sales", "Employees"]}
      }
    }
  ]
}

overview — Structural Metadata

Focuses on elements that probe cannot detect: formulas, charts, tables, named ranges. Uses openpyxl for metadata that the Rust backend doesn't expose.

agent-xlsx overview data.xlsx
agent-xlsx overview data.xlsx --include-formulas
agent-xlsx overview data.xlsx --include-formatting
{
  "file": "data.xlsx",
  "size_bytes": 107679,
  "overview_time_ms": 157.2,
  "sheets": [
    {
      "name": "txns",
      "index": 0,
      "dimensions": "A1:AZ324",
      "row_count": 324,
      "col_count": 52,
      "has_formulas": false,
      "has_charts": true,
      "chart_count": 1,
      "has_tables": false
    }
  ]
}

read — Data Extraction

Read data from any range or sheet. Default path uses Polars + fastexcel for speed. Use --formulas to fall back to openpyxl for formula string extraction.

agent-xlsx read data.xlsx                          # First sheet, first 100 rows
agent-xlsx read data.xlsx "A1:F50"                 # Specific range
agent-xlsx read data.xlsx --sheet Sales "B2:G100"  # Sheet + range
agent-xlsx read data.xlsx --limit 500 --offset 100 # Pagination
agent-xlsx read data.xlsx --formulas               # Include formula strings
agent-xlsx read data.xlsx --sort amount --descending
{
  "range": "A1:E5",
  "dimensions": {"rows": 4, "cols": 5},
  "headers": ["user_id", "txn_day", "txn_month", "txn_year", "txn_hour"],
  "data": [
    ["8bb055ad-caa1-40b6-a577-832425b02408", 1, 12, 2024, 8],
    ["8bb055ad-caa1-40b6-a577-832425b02408", 1, 12, 2024, 4]
  ],
  "row_count": 4,
  "truncated": false,
  "backend": "polars+fastexcel",
  "read_time_ms": 8.9
}

search — Cross-Workbook Search

Search for values across all sheets. Supports regex and case-insensitive matching.

agent-xlsx search data.xlsx "revenue"
agent-xlsx search data.xlsx "rev.*" --regex
agent-xlsx search data.xlsx "stripe" --ignore-case
agent-xlsx search data.xlsx "SUM(" --in-formulas    # Search formula strings
agent-xlsx search data.xlsx "error" --sheet Summary
{
  "query": "Stripe",
  "match_count": 25,
  "matches": [
    {"sheet": "txns", "column": "txn_description", "row": 12, "value": "Stripe DemoCompany Ltd. Payout UK"},
    {"sheet": "txns", "column": "merchant_name", "row": 12, "value": "Stripe"}
  ],
  "truncated": true,
  "search_time_ms": 18.8
}

inspect — Detailed Element Inspection

Deep inspection of workbook elements: formulas, charts, merged cells, named ranges, comments, conditional formatting, data validation, and hyperlinks.

agent-xlsx inspect data.xlsx --sheet Sales             # Sheet-level summary
agent-xlsx inspect data.xlsx --sheet Sales --range A1:C10
agent-xlsx inspect data.xlsx --names                    # Named ranges
agent-xlsx inspect data.xlsx --charts                   # Chart metadata
agent-xlsx inspect data.xlsx --vba                      # VBA module summary
agent-xlsx inspect data.xlsx --comments                 # Cell comments
agent-xlsx inspect data.xlsx --conditional "A1:Z100"    # Conditional formatting rules
agent-xlsx inspect data.xlsx --validation Sales         # Data validation rules
agent-xlsx inspect data.xlsx --hyperlinks Sales         # Hyperlinks

screenshot — Full-Fidelity HD Visual Capture

Export workbook sheets as HD PNG images. Three rendering engines auto-detected in order: Excel (xlwings, highest fidelity) → Aspose.Cells (cross-platform, no external app) → LibreOffice (free fallback). Use --engine to force a specific backend.

agent-xlsx screenshot data.xlsx                            # All sheets as HD PNG
agent-xlsx screenshot data.xlsx --sheet Summary            # Specific sheet
agent-xlsx screenshot data.xlsx --sheet "Sales,Summary"    # Multiple sheets
agent-xlsx screenshot data.xlsx "Sales!A1:F20"             # Range capture
agent-xlsx screenshot data.xlsx --engine aspose            # Force Aspose (cross-platform)
agent-xlsx screenshot data.xlsx --dpi 300                  # Higher resolution (default: 200)
agent-xlsx screenshot data.xlsx --output ./shots/          # Custom output directory
agent-xlsx screenshot data.xlsx --timeout 60               # Increase timeout (LibreOffice only)

Single sheet/range output:

{
  "status": "success",
  "format": "png",
  "path": "/tmp/agent-xlsx/data_Summary.png",
  "sheet": "Summary",
  "size_bytes": 245000,
  "dpi": 200,
  "capture_time_ms": 3200.0,
  "engine": "libreoffice+pymupdf"
}

Multi-sheet output:

{
  "status": "success",
  "format": "png",
  "dpi": 200,
  "sheets": [
    {"name": "Sales", "path": "/tmp/agent-xlsx/data_Sales.png", "size_bytes": 245000},
    {"name": "Summary", "path": "/tmp/agent-xlsx/data_Summary.png", "size_bytes": 89000}
  ],
  "capture_time_ms": 4100.0,
  "engine": "libreoffice+pymupdf"
}

export — Bulk Data Export

Export entire sheets to JSON, CSV, or Markdown.

agent-xlsx export data.xlsx --format csv               # CSV to stdout
agent-xlsx export data.xlsx --format markdown           # Markdown table
agent-xlsx export data.xlsx --format json               # JSON array
agent-xlsx export data.xlsx --format csv --output out.csv
agent-xlsx export data.xlsx --format csv --sheet Sales

write — Write Values and Formulas

Write values or formulas to cells. Supports single cells, ranges (via JSON), and CSV file imports.

agent-xlsx write data.xlsx "A1" "Hello"                          # Single value
agent-xlsx write data.xlsx "A1" "=SUM(B1:B100)" --formula        # Formula
agent-xlsx write data.xlsx "A1:C3" --json '[[1,2,3],[4,5,6],[7,8,9]]'
agent-xlsx write data.xlsx "A1" --from-csv import.csv
agent-xlsx write data.xlsx "A1" "42" --number-format "0.00%"
agent-xlsx write data.xlsx "A1" "Hello" --output new_file.xlsx   # Preserve original
agent-xlsx write data.xlsx "A1" "Hello" --sheet Summary

format — Read and Apply Cell Formatting

Read or modify cell formatting: fonts, fills, borders, number formats.

# Read formatting
agent-xlsx format data.xlsx "A1" --read --sheet Sales

# Apply formatting
agent-xlsx format data.xlsx "A1:D1" --font '{"bold": true, "size": 14}'
agent-xlsx format data.xlsx "B2:B100" --number-format "#,##0.00"
agent-xlsx format data.xlsx "A1:D10" --fill '{"color": "FFFF00"}'
agent-xlsx format data.xlsx "A1:D10" --border '{"style": "thin"}'
agent-xlsx format data.xlsx "A1:D10" --copy-from "G1"
{
  "cell": "A1",
  "value": "user_id",
  "font": {"name": "Aptos Narrow", "size": 12.0, "bold": false, "italic": false},
  "fill": {"type": "solid", "color": "indexed:9"},
  "border": {
    "top": {"style": "thin", "color": "indexed:10"},
    "bottom": {"style": "thin", "color": "indexed:10"}
  },
  "alignment": {"horizontal": null, "vertical": null, "wrap_text": null},
  "number_format": "@"
}

sheet — Sheet Management

List, create, rename, delete, copy, hide, and unhide sheets.

agent-xlsx sheet data.xlsx --list
agent-xlsx sheet data.xlsx --create "New Sheet"
agent-xlsx sheet data.xlsx --rename "Old Name" --new-name "New Name"
agent-xlsx sheet data.xlsx --delete "Temp"
agent-xlsx sheet data.xlsx --copy "Template" --new-name "Q1 Report"
agent-xlsx sheet data.xlsx --hide "Internal"
agent-xlsx sheet data.xlsx --unhide "Internal"

vba — VBA Macro Analysis

Extract and analyse VBA macros using oletools. Works headless on all platforms without Microsoft Excel.

agent-xlsx vba macros.xlsm --list        # List modules with security summary
agent-xlsx vba macros.xlsm --read Main   # Read a specific module's code
agent-xlsx vba macros.xlsm --read-all    # Read all module code
agent-xlsx vba macros.xlsm --security    # Full security analysis

recalc — Formula Recalculation

Scan for formula errors or trigger a full recalculation. Auto-detects engine: Excel → Aspose → LibreOffice.

agent-xlsx recalc data.xlsx --check-only    # Scan for #REF!, #DIV/0!, etc. (no engine needed)
agent-xlsx recalc data.xlsx                 # Full recalculation
agent-xlsx recalc data.xlsx --engine aspose # Force Aspose (cross-platform)
agent-xlsx recalc data.xlsx --timeout 120   # Timeout (LibreOffice only)
{
  "status": "success",
  "mode": "check_only",
  "total_formulas": 847,
  "total_errors": 3,
  "check_time_ms": 184.1,
  "error_summary": {
    "#REF!": {"count": 2, "locations": ["Sales!F12", "Sales!F15"]},
    "#DIV/0!": {"count": 1, "locations": ["Summary!C8"]}
  }
}

Architecture

agent-xlsx uses a multi-backend architecture, choosing the fastest backend capable of satisfying each request:

                            agent-xlsx CLI
                                  |
      +---------------+-----------+-----------+---------------+
      |               |           |           |               |
Polars+fastexcel   openpyxl   xlwings    Aspose.Cells   LibreOffice
 (Rust/Calamine)  (Pure Py)  (Excel)    (Cross-plat)    (Headless)

  Data reads      Metadata   Screenshots Screenshots    Screenshots
  Profiling       Formulas   Recalc      Recalc         Recalc
  Search          Formatting Objects     Objects
  Export          Writes

  + oletools (VBA extraction & analysis)

Rendering engine auto-detection: Excel (xlwings) → Aspose.Cells → LibreOffice. Use --engine to force a specific backend.

Backend Role Speed Used by
Polars + fastexcel Primary data engine 7-10x faster than openpyxl probe, read, search, export
openpyxl Metadata + writes Baseline overview, inspect, write, format, sheet
xlwings (Excel) Highest-fidelity rendering ~2s per sheet screenshot, recalc, objects, vba --run
Aspose.Cells (optional) Cross-platform rendering ~1-3s per sheet screenshot, recalc, objects
LibreOffice + PyMuPDF Free rendering fallback ~3s per sheet screenshot, recalc
oletools VBA extraction Fast vba

Why not just openpyxl?

openpyxl creates a Python object for every cell. For a 100K-row workbook, that's millions of allocations and ~50x the file size in RAM. Polars + fastexcel reads the same data through Rust with zero-copy Arrow transfer — the data never touches Python's heap until the agent needs it.


File Format Support

Format Extension Read Write Screenshot VBA
Excel (Open XML) .xlsx Yes Yes Yes N/A
Excel (Macro-enabled) .xlsm Yes Yes Yes Yes
Excel (Binary) .xlsb Yes - Yes Yes
Excel (Legacy) .xls Yes - Yes -
OpenDocument .ods Yes - Yes -

Deployment

agent-xlsx is designed for headless deployment in agentic infrastructure — no GUI, no Excel installation, no Docker requirement.

AWS ECS / Container (with Aspose — lightweight, no LibreOffice)

FROM python:3.12-slim

# Aspose system deps for rendering on Linux
RUN apt-get update && \
    apt-get install -y --no-install-recommends libgdiplus libfontconfig1 fonts-liberation && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

RUN pip install agent-xlsx aspose-cells-python

# Optional: set Aspose licence
# COPY Aspose.Cells.lic /app/
# ENV ASPOSE_LICENSE_PATH=/app/Aspose.Cells.lic

RUN agent-xlsx --help

AWS ECS / Container (with LibreOffice — free, larger image)

FROM python:3.12-slim

RUN apt-get update && \
    apt-get install -y --no-install-recommends libreoffice-calc && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

RUN pip install agent-xlsx

RUN agent-xlsx --help

MCP Tool Server

agent-xlsx's JSON output is designed for direct consumption by LLM agents. Each command returns structured JSON to stdout, making it trivial to wrap as an MCP tool:

@server.tool()
async def probe_workbook(file_path: str) -> str:
    result = subprocess.run(
        ["agent-xlsx", "probe", file_path],
        capture_output=True, text=True
    )
    return result.stdout

Claude Code Integration

Add agent-xlsx commands as custom slash commands or MCP tools for Claude Code to use when working with Excel files in your projects.


Error Handling

All errors return structured JSON with an error code, message, and actionable suggestions:

{
  "error": true,
  "code": "FILE_NOT_FOUND",
  "message": "File not found: missing.xlsx",
  "suggestions": [
    "Check the file path is correct",
    "Ensure the file exists and is readable"
  ]
}

Error codes include: FILE_NOT_FOUND, INVALID_FORMAT, SHEET_NOT_FOUND, INVALID_RANGE, EXCEL_REQUIRED, ASPOSE_NOT_INSTALLED, LIBREOFFICE_REQUIRED, NO_RENDERING_BACKEND, and more.


Performance

Benchmarked on a 255-row, 34-column, 6-sheet workbook:

Operation Time Backend
probe (default, lean) ~8ms Polars + fastexcel
probe --full (types + sample + stats) ~20ms Polars + fastexcel
read (range) ~9ms Polars + fastexcel
search (cross-workbook) ~19ms Polars + fastexcel
overview ~157ms openpyxl
inspect ~120ms openpyxl
recalc --check-only ~184ms openpyxl
screenshot (PNG, per-sheet) ~3s + ~0.1s/page LibreOffice + PyMuPDF
recalc (full) ~2.5s LibreOffice

The Polars + fastexcel backend maintains sub-50ms response times even on workbooks with 100K+ rows.


Development

# Clone and install
git clone https://github.com/apetta/agent-xlsx.git
cd agent-xlsx
uv sync

# Run commands
uv run agent-xlsx probe sample_data.xlsx

# Lint
uv run ruff check src/
uv run ruff format src/

Project Structure

src/agent_xlsx/
  cli.py                    # Typer CLI entry point
  commands/                 # 14 command implementations
    probe.py                  # Ultra-fast profiling (Polars)
    overview.py               # Structural metadata (openpyxl)
    read.py                   # Data extraction (Polars)
    search.py                 # Cross-workbook search (Polars)
    export.py                 # Bulk export (Polars)
    inspect.py                # Deep inspection (openpyxl)
    write.py                  # Write operations (openpyxl)
    format.py                 # Formatting read/write (openpyxl)
    sheet.py                  # Sheet management (openpyxl)
    screenshot.py             # Visual capture (Excel/Aspose/LO)
    objects.py                # Embedded objects (Excel/Aspose)
    vba.py                    # VBA analysis (oletools)
    recalc.py                 # Recalculation (Excel/Aspose/LO)
    license_cmd.py            # Aspose licence management
  adapters/                 # Backend adapters
    polars_adapter.py         # Polars + fastexcel (primary data)
    openpyxl_adapter.py       # openpyxl (metadata + writes)
    xlwings_adapter.py        # xlwings/Excel (rendering + objects)
    aspose_adapter.py         # Aspose.Cells (cross-platform rendering)
    libreoffice_adapter.py    # LibreOffice headless (fallback rendering)
    oletools_adapter.py       # oletools (VBA extraction)
  formatters/               # Output formatting
    json_formatter.py         # Token-efficient JSON output
    token_optimizer.py        # Output capping and aggregation
  utils/                    # Shared utilities
    errors.py                 # Error types and handler
    validation.py             # File and range validation
    constants.py              # Caps and limits
    memory.py                 # Memory budget checking
    dates.py                  # Date detection and serial→ISO conversion
    config.py                 # Persistent config (~/.agent-xlsx/)

Licence

Apache-2.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_xlsx-0.1.0.tar.gz (56.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_xlsx-0.1.0-py3-none-any.whl (67.5 kB view details)

Uploaded Python 3

File details

Details for the file agent_xlsx-0.1.0.tar.gz.

File metadata

  • Download URL: agent_xlsx-0.1.0.tar.gz
  • Upload date:
  • Size: 56.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agent_xlsx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4c7d4888ac688e566b2ee102dd82aefda99c727c1f0d36e083a64155b35d204b
MD5 eb62ae06c8117d31be43c8eb00213f71
BLAKE2b-256 7739e97c2539e50244156d2e5f3ced7d762397c50697223913f742bdc45ad791

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_xlsx-0.1.0.tar.gz:

Publisher: publish.yml on apetta/agent-xlsx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_xlsx-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agent_xlsx-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 67.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agent_xlsx-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 93c4a80c3c06f18a9f2e46a3e7477fc4e61891df87c4692f889b03abe7352983
MD5 fc64ca43e6862dd5906a21378ababec2
BLAKE2b-256 72d95fc2671fd1cf5c7a49d0219033620e16699959391b1194405668bd3f86c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_xlsx-0.1.0-py3-none-any.whl:

Publisher: publish.yml on apetta/agent-xlsx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page