TGC Columnar Compression — beats zstd on structured data (JSON/FHIR)
Project description
TGC Compress
Topological Generative Compression — columnar compression that beats zstd on structured JSON data.
Results on FHIR (healthcare) JSON
| Compressor | 36 MB FHIR Bundle | Ratio |
|---|---|---|
| gzip -9 | 1.50 MB | 3.99% |
| zstd -19 | 636 KB | 1.69% |
| zstd --ultra -22 | 575 KB | 1.53% |
| TGC Columnar | 646 KB | 1.72% |
TGC uses a novel pipeline: JSON structural decomposition → frequency-sorted dedup → BWT + MTF + RUNA/RUNB + rANS entropy coding.
Install
pip install tgc-compress
Or build from source:
cargo build --release -p tgc-cli
Usage
# Compress
tgc compress input.json -o output.tgc
# Decompress (byte-perfect roundtrip)
tgc decompress output.tgc -o restored.json
# Analyze
tgc analyze input.json
How It Works
- JSON Decomposition: Separates string content from structural frame, then deduplicates both lines and strings into 4 small streams
- Frequency-Sorted Dedup: Most common strings/lines get index 0, producing 1-byte varints
- BWT + MTF: Burrows-Wheeler Transform clusters similar contexts; Move-to-Front converts to zero-heavy sequences
- RUNA/RUNB: Bijective base-2 encoding collapses zero runs (run of 1000 → ~10 symbols)
- rANS Entropy Coding: Asymmetric Numeral Systems encode at fractional-bit precision, approaching Shannon entropy
Each substream is compressed independently using the best of BWT+rANS, grammar compression, Huffman, or raw — whichever is smallest.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tgc_compress-0.2.0.tar.gz.
File metadata
- Download URL: tgc_compress-0.2.0.tar.gz
- Upload date:
- Size: 738.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99df878b6810ac708c16cb643372b431a6143d5306044c53a7f7206b109f1dbf
|
|
| MD5 |
4aa01a3e089cd1fe7417b8f1a5f0799c
|
|
| BLAKE2b-256 |
e07274354a7039bac1844a77e11aa85530f731ce9e68ac787cfd5b0acec795b4
|
File details
Details for the file tgc_compress-0.2.0-py3-none-any.whl.
File metadata
- Download URL: tgc_compress-0.2.0-py3-none-any.whl
- Upload date:
- Size: 740.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fa1114f76d32329a57461477747d976cc0a48c178c5d598360aba4e24984e84
|
|
| MD5 |
b00c77878a8369c5666d29fd539bc326
|
|
| BLAKE2b-256 |
99069f4d774dabcf303a05d7a03debee3670e5610dd3234c06e222ea2b527dc9
|