Independent Python implementation of the SpreadsheetLLM SheetCompressor encoding for token-efficient LLM workflows.
Project description
sheet-compressor (Python)
Independent Python implementation of the SheetCompressor encoding from the
SpreadsheetLLM paper (Dong et al., Microsoft, 2024).
Pure core with zero required dependencies; conforms to the shared golden corpus in
fixtures/corpus/. See spec/SPEC.md for the
language-neutral contract.
Independent, community implementation. Not affiliated with or endorsed by Microsoft. Part of the multi-language
sheet-compressorproject.
Install
pip install sheet-compressor # core, zero required deps (Python >= 3.9)
pip install "sheet-compressor[tokenizer]" # + tiktoken-backed token counts
pip install "sheet-compressor[xlsx]" # + openpyxl .xlsx reader
Usage
from sheet_compressor import compress
grid = {
"rows": [
["Name", "Qty", "Price"],
["Apple", "3", "1.50"],
["", "", ""],
["Pear", "5", "0.30"],
],
"origin": {"row": 1, "col": 1},
}
result = compress(grid)
print(result["encodings"]["anchor"]["string"])
The three encodings
The same sparse two-table sheet, in each encoding (["string"] shown; each group also has a JSON
form and a ["tokenEstimate"]). Raw baseline 100 tokens → 80 / 77 / 23:
# encodings.anchor.string — addresses + values, empty rows dropped
A1,Product|B1,Q1|C1,Q2|D1,Q3|E1,Q4
A2,Apples|B2,100|C2,150|D2,200|E2,120
A15,Region|B15,Cost|C15,Margin|D15,Profit|E15,Status
A16,North|B16,500|C16,0.15|D16,75|E16,good
# encodings.invertedIndex.string — value → cell(s); repeats collapse (B4|D18,60)
A1,Product
B4|D18,60
E16|E18,good
# encodings.formatAggregation.string — values → type over ranges
IntNum: B2:E4,B16:B18,D16:D18
FloatNum: C16:C18
Text: A1:E1,A2:A4,A15:E15,A16:A18,E16:E18
See the project README for the complete strings.
Prompts — read the output with an LLM
The shared templates load via prompts: reader explainers (prompts.readers.anchor /
.invertedIndex / .formatAggregation), task templates (prompts.tasks.sheetQA /
.cellValueLookup / .tableRegionDetection) with {ENCODING} / {ADDRESS} / {QUESTION}
placeholders, and prompts.snippets.chartDescriptor. The library makes no LLM calls —
assemble the messages and send them to any chat model. Example with Claude (pip install anthropic):
from sheet_compressor import compress, prompts
import anthropic
result = compress(grid)
system = prompts.readers.anchor # decoder -> system prompt
user = (
prompts.tasks.sheetQA # task + data -> user message
.replace("{ENCODING}", result["encodings"]["anchor"]["string"])
.replace("{QUESTION}", "Which region had the highest profit?")
)
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY
msg = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
system=system,
messages=[{"role": "user", "content": user}],
)
print(msg.content[0].text)
Real tokenizer (optional)
Install with pip install sheet-compressor[tokenizer] and pass a tiktoken-backed
counter to compress:
from sheet_compressor import compress, create_token_counter
result = compress(grid, {"tokenCounter": create_token_counter()})
create_token_counter defaults to o200k_base (GPT-4o / GPT-5 family); pass
encoding="cl100k_base" for the GPT-3.5 / GPT-4 family. It raises a clear error
if tiktoken is not installed.
Optional .xlsx adapter
Install with pip install sheet-compressor[xlsx] and read a workbook into a
Grid via openpyxl:
from sheet_compressor import compress
from sheet_compressor.adapters.xlsx import read_sheet
grid = read_sheet("workbook.xlsx") # first sheet
grid = read_sheet("workbook.xlsx", {"sheet": "Q3"}) # by name
grid = read_sheet("workbook.xlsx", {"sheet": 1}) # by 0-indexed position
result = compress(grid)
read_sheet accepts a file path, raw bytes, or any binary file-like object.
It raises a clear ImportError if openpyxl is not installed. The pure core
keeps working without it — build the Grid yourself and pass it to
compress() directly.
Conformance
python3 -m unittest discover -s tests
The conformance suite walks every fixture under fixtures/corpus/ and asserts
byte-equal output against the goldens — the same shape as the TypeScript
reference's conformance test.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sheet_compressor-0.1.1.tar.gz.
File metadata
- Download URL: sheet_compressor-0.1.1.tar.gz
- Upload date:
- Size: 31.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af041b7631225446994e99ef33635b9bd8e8a8fff183020da215366001a47205
|
|
| MD5 |
a139477c7e510beecf3288dbf74ce8e7
|
|
| BLAKE2b-256 |
9b3b8fc7c62af874f44be9c8f713c407153ff40059cc0c47660393b52b354132
|
File details
Details for the file sheet_compressor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sheet_compressor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 30.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e3244587bcad4cbab3552f9b44c547e0dca950d907f355962ed5769b54372bc
|
|
| MD5 |
0a1512e64d248d1a372b30d02ebb6036
|
|
| BLAKE2b-256 |
0a4f51fe96b5b4dd9591720c115b5435ae7e6b3d84314348fa7ec870dbb1a34b
|