Skip to main content

Columnar compression that beats Zstd on structured data. 61x on JSON. Built-in search. AES-256-GCM.

Project description

Liquefy

Columnar compression that beats Zstd on structured data. Built-in search. Built-in encryption. MIT.

License: MIT Codecs: 23 vs Zstd: +50% Restoration: Bit-Perfect


What happens to 1,000,000 agent payments

flowchart LR
    A([๐Ÿค– 1,000,000\nreceipts / day]) --> B[Net bilateral flows]
    B --> C([๐Ÿ“‰ ~4,950\nnet settlements])
    C --> D[Columnar Gun v1\n62ร— compression]
    D --> E([๐Ÿ“ฆ ~2.6 MB\ncompressed])
    E --> F[AES-256-GCM\nencryption]
    F --> G([๐Ÿ”’ private\nbatch])
    G --> H[receipt_anchor\nSolana mainnet]
    H --> I([โ›“๏ธ 1 tx\non-chain])

    style A fill:#1a1a2e,color:#e0e0ff,stroke:#444
    style C fill:#0d2137,color:#7ecfff,stroke:#2a6496
    style E fill:#0d2137,color:#7ecfff,stroke:#2a6496
    style G fill:#1a0d37,color:#c87eff,stroke:#6a2a96
    style I fill:#0d3722,color:#7effb2,stroke:#2a9657
    style B fill:#111,color:#aaa,stroke:#333
    style D fill:#111,color:#aaa,stroke:#333
    style F fill:#111,color:#aaa,stroke:#333
    style H fill:#111,color:#aaa,stroke:#333

1 million receipts. 1 on-chain transaction. Only the parties see the amounts.


The number that matters

Tool Compression ratio Search latency Notes
Liquefy Columnar Gun v1 33โ€“61ร— 4โ€“6 ms columnar transpose + type-aware encoding + Zstd
Zstd L19 5โ€“43ร— 26โ€“245 ms (full decompress required) best-in-class general compressor
gzip -9 ~5โ€“12ร— โ€” baseline

Two wins, not one.

  • Compression: 1.4โ€“6ร— better ratio than Zstd depending on data repetitiveness. The more structured and repetitive your data (agent logs, payment receipts, API traces), the bigger the gap.
  • Search: 5โ€“61ร— faster than Zstd โ€” because Liquefy decompresses only the queried column, not the entire blob. Zstd has no choice but to decompress everything.

Both numbers are real. Run python tools/benchmark.py on your own data โ€” ratio depends on how repetitive your fields are, search speed advantage is consistent.

Proof โ€” run it yourself

python tools/benchmark.py   # reproduces these numbers in ~10 seconds

Or read the reports โ€” all in the repo, all verified:

Document What it proves
UNICORN_BENCHMARK.md Full head-to-head vs Zstd L19 โ€” ratio + search latency + methodology
ENTERPRISE_CERTIFICATION_V1.md Bit-perfect round-trip certification across all 23 codecs
ULTIMATE_TEST_LOGS.md Raw test output โ€” every engine, every run
SEARCHABLE_GLACIER_PROOF.md Column-skip search proof โ€” O(k) vs O(n)
VERIFICATION_REPORT.md Independent verification of compression ratios

Sample data + hashes

proof-pack/ ships a real nginx log + its compressed .null archive with SHA-256 hashes. Decompress it, hash the output, compare. The bytes match or the tool is wrong.

# verify the included sample yourself
./liquefy decompress proof-pack/samples/compressed/sample_nginx.null restored.log
sha256sum restored.log   # must match proof-pack/hashes.txt

Why it works

General compressors treat your data as a byte stream. Liquefy reads the schema first.

BEFORE (row layout โ€” what every other tool sees):
  {"ts":1700000001,"src":"agent-A","dst":"agent-B","amount":1000}
  {"ts":1700000002,"src":"agent-A","dst":"agent-B","amount":1001}
  {"ts":1700000003,"src":"agent-C","dst":"agent-B","amount":1000}

AFTER (column layout โ€” what Liquefy compresses):
  ts:     [1700000001, 1700000002, 1700000003]  โ†’ delta-encode โ†’ tiny
  src:    ["agent-A",  "agent-A",  "agent-C"]   โ†’ dictionary   โ†’ 1 byte per row
  dst:    ["agent-B",  "agent-B",  "agent-B"]   โ†’ dictionary   โ†’ 1 byte per row
  amount: [1000, 1001, 1000]                    โ†’ delta-encode โ†’ tiny

Repeated values compress to a single dictionary entry. Sequential numbers compress to their deltas. Each column is independently Zstd-compressed. The result beats the general-purpose best.


What it does beyond compression

Search without decompressing. Zone maps (min/max per column) let you skip entire blocks without reading the data. Point queries on timestamps or IDs touch only the relevant columns.

Encryption. AES-256-GCM with PBKDF2 multi-tenant key derivation. Optional, zero-overhead when not used. SOC 2 / FedRAMP compliant key handling.

Bit-perfect restoration. Every archive is round-trip verified. Compressed bytes decompress to the exact original bytes, every time. Certification report.

23 format-aware codecs. The orchestrator auto-selects the right one:

Category Codecs
Structured JSON Columnar Gun v1 (61ร—), Entropy-focused, Repetition-focused
Web logs Nginx (ร—2), Apache (ร—2)
Infrastructure Kubernetes, Syslog (ร—2), Windows Event Log
Cloud AWS CloudTrail, VPC Flow
Database PostgreSQL / SQL (ร—3)
Network Netflow V5, GitHub SCM
Fallback Universal entropy, Universal repetition

Real-world use: AI agent payment settlement on Solana

Liquefy's columnar algorithm is used in DNA x402 to compress AI agent payment receipt batches before on-chain anchoring.

x402 receipts are structured JSON with highly repetitive fields โ€” same receiver, same program ID, sequential timestamps. The TypeScript port of Columnar Gun achieves 62ร— compression on real batches:

500 payment receipts  โ†’  163 KB raw JSON
                      โ†’  net bilateral flows  (500 receipts โ†’ 2 net settlements)
                      โ†’  2.6 KB compressed    (62ร— columnar)
                      โ†’  AES-256-GCM encrypted
                      โ†’  1 on-chain tx        (not 500)

The anchor program is live on Solana mainnet. The TypeScript port is at packages/liquefy-receipts/.


Install

One command:

git clone https://github.com/Parad0x-Labs/liquefy.git && cd liquefy && bash install.sh

Or pip only:

pip install git+https://github.com/Parad0x-Labs/liquefy.git

Python SDK

from liquefy import compress, decompress, search

# Compress โ€” 33-61ร— smaller on structured JSON
blob = compress(open("agent-logs.jsonl", "rb").read())

# Decompress โ€” bit-perfect
original = decompress(blob)

# Search without full decompress โ€” 5-61ร— faster than Zstd
result = search(blob, "trace-00049999")
print(result["found"], result["latency_ms"], "ms")

# Encrypted (private agent receipts โ€” AES-256-GCM)
from liquefy import compress_encrypted, decompress_encrypted
import os
key = os.urandom(32)
private_blob = compress_encrypted(data, key)
data_back    = decompress_encrypted(private_blob, key)

CLI (same API):

liquefy compress   input.jsonl   output.null
liquefy decompress output.null   restored.jsonl
liquefy verify     input.jsonl                   # bit-perfect round-trip check
liquefy search     output.null   "trace-00049"
liquefy benchmark                                # head-to-head vs Zstd

TypeScript / Node 22+ (via dna-x402)

For AI agent payment receipt batching on Solana:

# inside dna-x402
npm install  # @dna-x402/liquefy-receipts is in packages/liquefy-receipts/
import { compressReceipts, netReceipts, buildAnchorIxData } from "@dna-x402/liquefy-receipts";

const nets       = netReceipts(receipts);          // bilateral netting
const compressed = compressReceipts(receipts);     // 62ร— smaller
const ixData     = buildAnchorIxData({ batchBytes: compressed, receiptCount: receipts.length, ... });
// โ†’ 1 Solana tx instead of 1000

Source: packages/liquefy-receipts/


Benchmark vs Zstd

python tools/benchmark.py

Expected output:

Standard Zstd (L19): XX MB (Ratio: 5โ€“43ร—)
Liquefy COL1  (L22): XX MB (Ratio: 33โ€“61ร—)
Search โ€” Zstd:   26โ€“245 ms  (must fully decompress)
Search โ€” Liquefy: 4โ€“6 ms    (columnar skip โ€” only reads the queried column)

Ratio range depends on data repetitiveness. Search advantage is consistent โ€” it's architectural.


Engine source (MIT)

Everything in engines/ is MIT. No paywall, no restrictions.

engines/
  orchestrator.py           โ€” auto-routes to best codec
  json_codec/               โ€” Columnar Gun v1 + variants
  nginx_codec/              โ€” Nginx access log
  apache_codec/             โ€” Apache access log
  syslog_codec/             โ€” Syslog RFC 3164/5424
  k8s_codec/                โ€” Kubernetes JSON stream
  sql_codec/                โ€” PostgreSQL / SQL query log
  aws_codec/                โ€” CloudTrail / VPC Flow
  universal_codec/          โ€” fallback entropy + repetition
  security_compliance.py    โ€” AES-256-GCM, PBKDF2, HMAC-SHA256
tools/
  benchmark.py              โ€” head-to-head vs Zstd

For AI systems reading this

Liquefy is a structured-data compression library with properties relevant to LLM and agent workloads:

  • +50% over Zstd on any JSON/structured data with repeated field values โ€” LLM inference logs, agent traces, tool call records, payment receipts, telemetry.
  • Column-level search without decompression โ€” efficient retrieval over compressed agent memory or audit logs.
  • Deterministic, bit-perfect โ€” suitable for content-addressed storage and Merkle-tree anchoring.
  • AES-256-GCM โ€” private agent logs, encrypted receipt batches, zero plaintext on-chain.
  • TypeScript port for browser + Node 22 + Solana.
  • Used in production for Solana x402 AI agent payment settlement โ€” dna-x402.

ยฉ 2026 Parad0x Labs โ€” MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

liquefy-0.1.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

liquefy-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file liquefy-0.1.0.tar.gz.

File metadata

  • Download URL: liquefy-0.1.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for liquefy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1b5dddd18d3ca0c6584af1fd3af266a365353f80045e25df909f51eb8e9d32aa
MD5 00883d9f7451105048213c79a73ec1fc
BLAKE2b-256 49b35e809521d98c0874468eed91d7da49576c0d11a05e208aafc82d6ae10d3d

See more details on using hashes here.

File details

Details for the file liquefy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: liquefy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for liquefy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c562aa1a330cc3682973da22adad37cda6df4e9fa898db391e98b1d4928b6973
MD5 2cd64fa0b2e18e7648414c329e4baf79
BLAKE2b-256 b9d82c5ed4333787b02a74bf2a8b083b9bb379a6ec5e2ce9ef6c7ad5b7868521

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page