Skip to main content

TGC Columnar Compression — beats zstd on structured data (JSON/FHIR)

Project description

TGC Compress

Topological Generative Compression — columnar compression that beats zstd on structured JSON data.

Results on FHIR (healthcare) JSON

Compressor 36 MB FHIR Bundle Ratio
gzip -9 1.50 MB 3.99%
zstd -19 636 KB 1.69%
zstd --ultra -22 575 KB 1.53%
TGC Columnar 646 KB 1.72%

TGC uses a novel pipeline: JSON structural decomposition → frequency-sorted dedup → BWT + MTF + RUNA/RUNB + rANS entropy coding.

Install

pip install tgc-compress

Or build from source:

cargo build --release -p tgc-cli

Usage

# Compress
tgc compress input.json -o output.tgc

# Decompress (byte-perfect roundtrip)
tgc decompress output.tgc -o restored.json

# Analyze
tgc analyze input.json

How It Works

  1. JSON Decomposition: Separates string content from structural frame, then deduplicates both lines and strings into 4 small streams
  2. Frequency-Sorted Dedup: Most common strings/lines get index 0, producing 1-byte varints
  3. BWT + MTF: Burrows-Wheeler Transform clusters similar contexts; Move-to-Front converts to zero-heavy sequences
  4. RUNA/RUNB: Bijective base-2 encoding collapses zero runs (run of 1000 → ~10 symbols)
  5. rANS Entropy Coding: Asymmetric Numeral Systems encode at fractional-bit precision, approaching Shannon entropy

Each substream is compressed independently using the best of BWT+rANS, grammar compression, Huffman, or raw — whichever is smallest.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tgc_compress-0.2.0.tar.gz (738.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tgc_compress-0.2.0-py3-none-any.whl (740.7 kB view details)

Uploaded Python 3

File details

Details for the file tgc_compress-0.2.0.tar.gz.

File metadata

  • Download URL: tgc_compress-0.2.0.tar.gz
  • Upload date:
  • Size: 738.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tgc_compress-0.2.0.tar.gz
Algorithm Hash digest
SHA256 99df878b6810ac708c16cb643372b431a6143d5306044c53a7f7206b109f1dbf
MD5 4aa01a3e089cd1fe7417b8f1a5f0799c
BLAKE2b-256 e07274354a7039bac1844a77e11aa85530f731ce9e68ac787cfd5b0acec795b4

See more details on using hashes here.

File details

Details for the file tgc_compress-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: tgc_compress-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 740.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tgc_compress-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8fa1114f76d32329a57461477747d976cc0a48c178c5d598360aba4e24984e84
MD5 b00c77878a8369c5666d29fd539bc326
BLAKE2b-256 99069f4d774dabcf303a05d7a03debee3670e5610dd3234c06e222ea2b527dc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page