Skip to main content

Compress Excel spreadsheets for LLM context windows. Rust-powered, 80-95% token reduction.

Project description

XLcompress

Compress Excel spreadsheets for LLM context windows. Written in Rust.

PyPI version Python versions CI status License

PyPI · Paper · Source


Excel files are one of the most common data formats, but feeding them to LLMs wastes tokens on empty cells, repetitive formatting, and verbose address-value pairs. xlcompress applies the SpreadsheetLLM compression pipeline to reduce spreadsheet token usage by 80-95% while preserving the structural information LLMs need.

Compression results

Sheet type Original tokens Compressed tokens Reduction
Financial model (200 rows) 5,240 799 84.7%
Lookup table (100 rows) 1,922 186 90.3%
Sparse sheet (few values) 106 106 0% (auto-fallback)

Sparse sheets with scattered values automatically fall back to raw encoding when compression would increase size.

Install

pip install xlcompress

Pre-built wheels for Linux (x86_64, aarch64), macOS (x86_64, ARM), and Windows (x86_64). No Python dependencies. No Rust toolchain required.

Quickstart

import xlcompress

# One-liner: compress and get a prompt-ready string
prompt = xlcompress.compress_to_string("financials.xlsx")

# Per-sheet results with token counts
results = xlcompress.compress("financials.xlsx")
for sheet in results:
    print(f"{sheet.name}: {sheet.original_tokens} -> {sheet.compressed_tokens} tokens")

Usage

Excel files

import xlcompress

# All sheets
results = xlcompress.compress("report.xlsx")

# Specific sheets
results = xlcompress.compress("data.xlsb", sheets=["Q1", "Q2"])

# From bytes (S3, HTTP responses, file uploads)
results = xlcompress.compress(file_bytes, format="xlsx")

# List sheets without compressing
names = xlcompress.list_sheets("workbook.xlsx")

CSV

# From a CSV file
result = xlcompress.compress_csv("data.csv")

# From a CSV string
csv_text = "Name,Age,City\nAlice,30,NYC\nBob,25,LA"
result = xlcompress.compress_csv(csv_text)

# Custom delimiter
result = xlcompress.compress_csv("data.tsv", delimiter="\t")

Text and grids

# Tab-delimited text (e.g. pasted from a spreadsheet)
pasted = "Region\tQ1\tQ2\nNorth\t1000\t2000\nSouth\t800\t1200"
result = xlcompress.compress_text(pasted)

# Raw 2D grid
grid = [["Name", "Score"], ["Alice", "95"], ["Bob", "87"]]
result = xlcompress.compress_grid(grid)

Output modes

# Aggregated (default) — best compression, type-labeled ranges
xlcompress.compress("file.xlsx", mode="aggregated")
# -> "(IntNum|A1:B10),(DateData|C1:C5),..."

# Vanilla — preserves all cell values, no compression
xlcompress.compress("file.xlsx", mode="vanilla")
# -> "|A1, Revenue|\n|B1, 1000|\n..."

# Inverted index — value-to-address mapping
xlcompress.compress("file.xlsx", mode="inverted")
# -> {"Revenue": ["A1"], "1000": ["B1", "C3"], ...}

Parameters

Parameter Type Default Description
source str, PathLike, bytes required File path or raw file bytes
format str | None None "xlsx" or "xlsb". Required for bytes. Auto-detected for paths.
sheets list[str] | None None Filter to specific sheets. None = all.
mode str "aggregated" "aggregated", "vanilla", or "inverted"
structural bool True Run table boundary detection
structural_k int 4 Rows/cols to keep around detected boundaries

Return type

class SheetResult:
    name: str              # Sheet name
    original: str          # Raw |Address, Value| representation
    compressed: str        # Compressed output
    original_tokens: int   # Token count before compression
    compressed_tokens: int # Token count after compression

How it works

The compression pipeline runs three stages:

1. Structural compression — Detects table boundaries using heuristic analysis of cell patterns, then keeps only rows and columns near detected tables. Large empty regions are removed entirely.

2. Data-format aggregation — Groups contiguous cells of the same type (integers, dates, text, currencies, etc.) into labeled ranges. A column of 100 integers becomes (IntNum|A1:A100) instead of 100 separate entries.

3. Smart output selection — Renders the result in the chosen mode. For sparse sheets where aggregation would inflate the output, automatically falls back to vanilla encoding.

Why Rust?

The pipeline involves BFS flood-fill over cell grids, regex-based type detection, and boundary analysis — all CPU-bound work that benefits from compiled performance. The Python interface is a thin PyO3 wrapper over the Rust implementation, with no Python dependencies.

Pipeline

Based on the SheetCompressor pipeline from:

Hao, Y., et al. "SpreadsheetLLM: Encoding Spreadsheets for Large Language Models." arXiv:2407.09025, 2024.

Architecture

Crate Role
xlcompress Python bindings (PyO3), Excel parsing (calamine), pipeline orchestration
compress_aggregation Data-format aggregation via BFS flood-fill
compress_structure Structural compression with boundary-based filtering
heuristic_detection Table boundary detection (TableSense hybrid algorithm)
compress_excel Standalone CLI for JSONL input
xlsb_to_xlsx .xlsb to .xlsx converter
wasm_api WebAssembly bindings for browser use

Supported inputs

Input Function Notes
.xlsx compress() Excel 2007+ XML format
.xlsb compress() Excel binary format
Bytes compress() Raw file bytes with format="xlsx" or format="xlsb"
CSV compress_csv() File path, string, or file-like object
Text compress_text() Tab-delimited or custom delimiter
2D grid compress_grid() list[list[str]] directly

Browser UI

A drag-and-drop browser interface is included in docs/. Serve it locally or deploy to GitHub Pages.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xlcompress-0.3.2.tar.gz (60.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

xlcompress-0.3.2-cp311-cp311-win_amd64.whl (1.3 MB view details)

Uploaded CPython 3.11Windows x86-64

xlcompress-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

xlcompress-0.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

xlcompress-0.3.2-cp311-cp311-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

xlcompress-0.3.2-cp311-cp311-macosx_10_12_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file xlcompress-0.3.2.tar.gz.

File metadata

  • Download URL: xlcompress-0.3.2.tar.gz
  • Upload date:
  • Size: 60.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for xlcompress-0.3.2.tar.gz
Algorithm Hash digest
SHA256 e15761dde7fa7392f5648b3f5b33b8a462f018f1fb5c8f0f51c1a6e9379c6eff
MD5 bb7df7bfd79e62f834c38cd9e9e67f59
BLAKE2b-256 3658248bb3a09819354aa893c3c1786c51d4df3dfde44b71efa17775021601c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for xlcompress-0.3.2.tar.gz:

Publisher: release.yml on JustinStrik/xlcompress

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xlcompress-0.3.2-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: xlcompress-0.3.2-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for xlcompress-0.3.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0cb8feb00ff770ca545594c327102282376244a664fc00c4a7501f506f35f455
MD5 5a2c232d2c98a6defffe3324a16e5879
BLAKE2b-256 e432013cd23907570b3766f2e9b0fc6d335fb61f5d13df9a5da73623ee380693

See more details on using hashes here.

Provenance

The following attestation bundles were made for xlcompress-0.3.2-cp311-cp311-win_amd64.whl:

Publisher: release.yml on JustinStrik/xlcompress

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xlcompress-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xlcompress-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b37dfeaf0163d6316ba671fe83dbde74ba8d7cb1c0ae566d436200e0a64d07ef
MD5 c089b3f07c3e534aa49d37111f6879fc
BLAKE2b-256 164cf0c7776db1936ed40b90414f4d01d90cf04df499b2ac0879f6c91478bea9

See more details on using hashes here.

Provenance

The following attestation bundles were made for xlcompress-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on JustinStrik/xlcompress

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xlcompress-0.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for xlcompress-0.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9b086b2ad7ef0dc4f1c11451eb93f998978de6d9fb44f9aac6a0373456e452f5
MD5 bd144b7dbee148aa38d53aa2887e7da1
BLAKE2b-256 83e126b7374da76f9ec75bc99416bcf8474f04e2e7ef5591d12fcf16d3b62cd9

See more details on using hashes here.

Provenance

The following attestation bundles were made for xlcompress-0.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on JustinStrik/xlcompress

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xlcompress-0.3.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for xlcompress-0.3.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 11d8c7a6ff11d9906f734f8f376935f232a79bf645358a0b63f33b8c13eb17cf
MD5 b42af3f1b0ff8eb6f265126a2ac99abd
BLAKE2b-256 57af744b66257f6d84ddec40320a39f09080c80ee612c45bc9f1954a158b6fa6

See more details on using hashes here.

Provenance

The following attestation bundles were made for xlcompress-0.3.2-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on JustinStrik/xlcompress

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xlcompress-0.3.2-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for xlcompress-0.3.2-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 01cd1e6849e3f8bc32a2eef6ad6af00a71562fa77465c0e3f60b1d14caf83a66
MD5 8001784a067f06d7803e7c948b340a04
BLAKE2b-256 2e7d25b3d5a3a1707dbf28940c50162aaa588ebbdcdbf5ee10c72ef6550b7885

See more details on using hashes here.

Provenance

The following attestation bundles were made for xlcompress-0.3.2-cp311-cp311-macosx_10_12_x86_64.whl:

Publisher: release.yml on JustinStrik/xlcompress

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page