Independent Python implementation of the SpreadsheetLLM SheetCompressor encoding for token-efficient LLM workflows.

These details have not been verified by PyPI

Project links

Project description

sheet-compressor (Python)

Independent Python implementation of the SheetCompressor encoding from the SpreadsheetLLM paper (Dong et al., Microsoft, 2024). Pure core with zero required dependencies; conforms to the shared golden corpus in fixtures/corpus/. See spec/SPEC.md for the language-neutral contract.

Independent, community implementation. Not affiliated with or endorsed by Microsoft. Part of the multi-language sheet-compressor project.

Install

pip install sheet-compressor                 # core, zero required deps (Python >= 3.9)
pip install "sheet-compressor[tokenizer]"    # + tiktoken-backed token counts
pip install "sheet-compressor[xlsx]"         # + openpyxl .xlsx reader

Usage

from sheet_compressor import compress

grid = {
    "rows": [
        ["Name", "Qty", "Price"],
        ["Apple", "3", "1.50"],
        ["", "", ""],
        ["Pear", "5", "0.30"],
    ],
    "origin": {"row": 1, "col": 1},
}
result = compress(grid)
print(result["encodings"]["anchor"]["string"])

The three encodings

The same sparse two-table sheet, in each encoding (["string"] shown; each group also has a JSON form and a ["tokenEstimate"]). Raw baseline 100 tokens → 80 / 77 / 23:

# encodings.anchor.string  — addresses + values, empty rows dropped
A1,Product|B1,Q1|C1,Q2|D1,Q3|E1,Q4
A2,Apples|B2,100|C2,150|D2,200|E2,120
A15,Region|B15,Cost|C15,Margin|D15,Profit|E15,Status
A16,North|B16,500|C16,0.15|D16,75|E16,good

# encodings.invertedIndex.string  — value → cell(s); repeats collapse (B4|D18,60)
A1,Product
B4|D18,60
E16|E18,good

# encodings.formatAggregation.string  — values → type over ranges
IntNum: B2:E4,B16:B18,D16:D18
FloatNum: C16:C18
Text: A1:E1,A2:A4,A15:E15,A16:A18,E16:E18

See the project README for the complete strings.

Prompts — read the output with an LLM

The shared templates load via prompts: reader explainers (prompts.readers.anchor / .invertedIndex / .formatAggregation), task templates (prompts.tasks.sheetQA / .cellValueLookup / .tableRegionDetection) with {ENCODING} / {ADDRESS} / {QUESTION} placeholders, and prompts.snippets.chartDescriptor. The library makes no LLM calls — assemble the messages and send them to any chat model. Example with Claude (pip install anthropic):

from sheet_compressor import compress, prompts
import anthropic

result = compress(grid)
system = prompts.readers.anchor                  # decoder -> system prompt
user = (
    prompts.tasks.sheetQA                        # task + data -> user message
    .replace("{ENCODING}", result["encodings"]["anchor"]["string"])
    .replace("{QUESTION}", "Which region had the highest profit?")
)

client = anthropic.Anthropic()                   # reads ANTHROPIC_API_KEY
msg = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    system=system,
    messages=[{"role": "user", "content": user}],
)
print(msg.content[0].text)

Real tokenizer (optional)

Install with pip install sheet-compressor[tokenizer] and pass a tiktoken-backed counter to compress:

from sheet_compressor import compress, create_token_counter

result = compress(grid, {"tokenCounter": create_token_counter()})

create_token_counter defaults to o200k_base (GPT-4o / GPT-5 family); pass encoding="cl100k_base" for the GPT-3.5 / GPT-4 family. It raises a clear error if tiktoken is not installed.

Optional .xlsx adapter

Install with pip install sheet-compressor[xlsx] and read a workbook into a Grid via openpyxl:

from sheet_compressor import compress
from sheet_compressor.adapters.xlsx import read_sheet

grid = read_sheet("workbook.xlsx")            # first sheet
grid = read_sheet("workbook.xlsx", {"sheet": "Q3"})  # by name
grid = read_sheet("workbook.xlsx", {"sheet": 1})     # by 0-indexed position
result = compress(grid)

read_sheet accepts a file path, raw bytes, or any binary file-like object. It raises a clear ImportError if openpyxl is not installed. The pure core keeps working without it — build the Grid yourself and pass it to compress() directly.

Conformance

python3 -m unittest discover -s tests

The conformance suite walks every fixture under fixtures/corpus/ and asserts byte-equal output against the goldens — the same shape as the TypeScript reference's conformance test.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jun 18, 2026

0.1.0

Jun 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheet_compressor-0.1.1.tar.gz (31.0 kB view details)

Uploaded Jun 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sheet_compressor-0.1.1-py3-none-any.whl (30.5 kB view details)

Uploaded Jun 18, 2026 Python 3

File details

Details for the file sheet_compressor-0.1.1.tar.gz.

File metadata

Download URL: sheet_compressor-0.1.1.tar.gz
Upload date: Jun 18, 2026
Size: 31.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sheet_compressor-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`af041b7631225446994e99ef33635b9bd8e8a8fff183020da215366001a47205`
MD5	`a139477c7e510beecf3288dbf74ce8e7`
BLAKE2b-256	`9b3b8fc7c62af874f44be9c8f713c407153ff40059cc0c47660393b52b354132`

See more details on using hashes here.

File details

Details for the file sheet_compressor-0.1.1-py3-none-any.whl.

File metadata

Download URL: sheet_compressor-0.1.1-py3-none-any.whl
Upload date: Jun 18, 2026
Size: 30.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sheet_compressor-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3e3244587bcad4cbab3552f9b44c547e0dca950d907f355962ed5769b54372bc`
MD5	`0a1512e64d248d1a372b30d02ebb6036`
BLAKE2b-256	`0a4f51fe96b5b4dd9591720c115b5435ae7e6b3d84314348fa7ec870dbb1a34b`

See more details on using hashes here.

sheet-compressor 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sheet-compressor (Python)

Install

Usage

The three encodings

Prompts — read the output with an LLM

Real tokenizer (optional)

Optional .xlsx adapter

Conformance

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes