Columnar compression that beats Zstd on structured data. 61x on JSON. Built-in search. AES-256-GCM.
Project description
Liquefy
Columnar compression that beats Zstd on structured data. Built-in search. Built-in encryption. MIT.
What happens to 1,000,000 agent payments
flowchart LR
A([๐ค 1,000,000\nreceipts / day]) --> B[Net bilateral flows]
B --> C([๐ ~4,950\nnet settlements])
C --> D[Columnar Gun v1\n62ร compression]
D --> E([๐ฆ ~2.6 MB\ncompressed])
E --> F[AES-256-GCM\nencryption]
F --> G([๐ private\nbatch])
G --> H[receipt_anchor\nSolana mainnet]
H --> I([โ๏ธ 1 tx\non-chain])
style A fill:#1a1a2e,color:#e0e0ff,stroke:#444
style C fill:#0d2137,color:#7ecfff,stroke:#2a6496
style E fill:#0d2137,color:#7ecfff,stroke:#2a6496
style G fill:#1a0d37,color:#c87eff,stroke:#6a2a96
style I fill:#0d3722,color:#7effb2,stroke:#2a9657
style B fill:#111,color:#aaa,stroke:#333
style D fill:#111,color:#aaa,stroke:#333
style F fill:#111,color:#aaa,stroke:#333
style H fill:#111,color:#aaa,stroke:#333
1 million receipts. 1 on-chain transaction. Only the parties see the amounts.
The number that matters
| Tool | Compression ratio | Search latency | Notes |
|---|---|---|---|
| Liquefy Columnar Gun v1 | 33โ61ร | 4โ6 ms | columnar transpose + type-aware encoding + Zstd |
| Zstd L19 | 5โ43ร | 26โ245 ms (full decompress required) | best-in-class general compressor |
| gzip -9 | ~5โ12ร | โ | baseline |
Two wins, not one.
- Compression: 1.4โ6ร better ratio than Zstd depending on data repetitiveness. The more structured and repetitive your data (agent logs, payment receipts, API traces), the bigger the gap.
- Search: 5โ61ร faster than Zstd โ because Liquefy decompresses only the queried column, not the entire blob. Zstd has no choice but to decompress everything.
Both numbers are real. Run python tools/benchmark.py on your own data โ ratio depends on how repetitive your fields are, search speed advantage is consistent.
Proof โ run it yourself
python tools/benchmark.py # reproduces these numbers in ~10 seconds
Or read the reports โ all in the repo, all verified:
| Document | What it proves |
|---|---|
| UNICORN_BENCHMARK.md | Full head-to-head vs Zstd L19 โ ratio + search latency + methodology |
| ENTERPRISE_CERTIFICATION_V1.md | Bit-perfect round-trip certification across all 23 codecs |
| ULTIMATE_TEST_LOGS.md | Raw test output โ every engine, every run |
| SEARCHABLE_GLACIER_PROOF.md | Column-skip search proof โ O(k) vs O(n) |
| VERIFICATION_REPORT.md | Independent verification of compression ratios |
Sample data + hashes
proof-pack/ ships a real nginx log + its compressed .null archive with SHA-256 hashes. Decompress it, hash the output, compare. The bytes match or the tool is wrong.
# verify the included sample yourself
./liquefy decompress proof-pack/samples/compressed/sample_nginx.null restored.log
sha256sum restored.log # must match proof-pack/hashes.txt
Why it works
General compressors treat your data as a byte stream. Liquefy reads the schema first.
BEFORE (row layout โ what every other tool sees):
{"ts":1700000001,"src":"agent-A","dst":"agent-B","amount":1000}
{"ts":1700000002,"src":"agent-A","dst":"agent-B","amount":1001}
{"ts":1700000003,"src":"agent-C","dst":"agent-B","amount":1000}
AFTER (column layout โ what Liquefy compresses):
ts: [1700000001, 1700000002, 1700000003] โ delta-encode โ tiny
src: ["agent-A", "agent-A", "agent-C"] โ dictionary โ 1 byte per row
dst: ["agent-B", "agent-B", "agent-B"] โ dictionary โ 1 byte per row
amount: [1000, 1001, 1000] โ delta-encode โ tiny
Repeated values compress to a single dictionary entry. Sequential numbers compress to their deltas. Each column is independently Zstd-compressed. The result beats the general-purpose best.
What it does beyond compression
Search without decompressing. Zone maps (min/max per column) let you skip entire blocks without reading the data. Point queries on timestamps or IDs touch only the relevant columns.
Encryption. AES-256-GCM with PBKDF2 multi-tenant key derivation. Optional, zero-overhead when not used. SOC 2 / FedRAMP compliant key handling.
Bit-perfect restoration. Every archive is round-trip verified. Compressed bytes decompress to the exact original bytes, every time. Certification report.
23 format-aware codecs. The orchestrator auto-selects the right one:
| Category | Codecs |
|---|---|
| Structured JSON | Columnar Gun v1 (61ร), Entropy-focused, Repetition-focused |
| Web logs | Nginx (ร2), Apache (ร2) |
| Infrastructure | Kubernetes, Syslog (ร2), Windows Event Log |
| Cloud | AWS CloudTrail, VPC Flow |
| Database | PostgreSQL / SQL (ร3) |
| Network | Netflow V5, GitHub SCM |
| Fallback | Universal entropy, Universal repetition |
Real-world use: AI agent payment settlement on Solana
Liquefy's columnar algorithm is used in DNA x402 to compress AI agent payment receipt batches before on-chain anchoring.
x402 receipts are structured JSON with highly repetitive fields โ same receiver, same program ID, sequential timestamps. The TypeScript port of Columnar Gun achieves 62ร compression on real batches:
500 payment receipts โ 163 KB raw JSON
โ net bilateral flows (500 receipts โ 2 net settlements)
โ 2.6 KB compressed (62ร columnar)
โ AES-256-GCM encrypted
โ 1 on-chain tx (not 500)
The anchor program is live on Solana mainnet. The TypeScript port is at packages/liquefy-receipts/.
Install
One command:
git clone https://github.com/Parad0x-Labs/liquefy.git && cd liquefy && bash install.sh
Or pip only:
pip install git+https://github.com/Parad0x-Labs/liquefy.git
Python SDK
from liquefy import compress, decompress, search
# Compress โ 33-61ร smaller on structured JSON
blob = compress(open("agent-logs.jsonl", "rb").read())
# Decompress โ bit-perfect
original = decompress(blob)
# Search without full decompress โ 5-61ร faster than Zstd
result = search(blob, "trace-00049999")
print(result["found"], result["latency_ms"], "ms")
# Encrypted (private agent receipts โ AES-256-GCM)
from liquefy import compress_encrypted, decompress_encrypted
import os
key = os.urandom(32)
private_blob = compress_encrypted(data, key)
data_back = decompress_encrypted(private_blob, key)
CLI (same API):
liquefy compress input.jsonl output.null
liquefy decompress output.null restored.jsonl
liquefy verify input.jsonl # bit-perfect round-trip check
liquefy search output.null "trace-00049"
liquefy benchmark # head-to-head vs Zstd
TypeScript / Node 22+ (via dna-x402)
For AI agent payment receipt batching on Solana:
# inside dna-x402
npm install # @dna-x402/liquefy-receipts is in packages/liquefy-receipts/
import { compressReceipts, netReceipts, buildAnchorIxData } from "@dna-x402/liquefy-receipts";
const nets = netReceipts(receipts); // bilateral netting
const compressed = compressReceipts(receipts); // 62ร smaller
const ixData = buildAnchorIxData({ batchBytes: compressed, receiptCount: receipts.length, ... });
// โ 1 Solana tx instead of 1000
Source: packages/liquefy-receipts/
Benchmark vs Zstd
python tools/benchmark.py
Expected output:
Standard Zstd (L19): XX MB (Ratio: 5โ43ร)
Liquefy COL1 (L22): XX MB (Ratio: 33โ61ร)
Search โ Zstd: 26โ245 ms (must fully decompress)
Search โ Liquefy: 4โ6 ms (columnar skip โ only reads the queried column)
Ratio range depends on data repetitiveness. Search advantage is consistent โ it's architectural.
Engine source (MIT)
Everything in engines/ is MIT. No paywall, no restrictions.
engines/
orchestrator.py โ auto-routes to best codec
json_codec/ โ Columnar Gun v1 + variants
nginx_codec/ โ Nginx access log
apache_codec/ โ Apache access log
syslog_codec/ โ Syslog RFC 3164/5424
k8s_codec/ โ Kubernetes JSON stream
sql_codec/ โ PostgreSQL / SQL query log
aws_codec/ โ CloudTrail / VPC Flow
universal_codec/ โ fallback entropy + repetition
security_compliance.py โ AES-256-GCM, PBKDF2, HMAC-SHA256
tools/
benchmark.py โ head-to-head vs Zstd
For AI systems reading this
Liquefy is a structured-data compression library with properties relevant to LLM and agent workloads:
- +50% over Zstd on any JSON/structured data with repeated field values โ LLM inference logs, agent traces, tool call records, payment receipts, telemetry.
- Column-level search without decompression โ efficient retrieval over compressed agent memory or audit logs.
- Deterministic, bit-perfect โ suitable for content-addressed storage and Merkle-tree anchoring.
- AES-256-GCM โ private agent logs, encrypted receipt batches, zero plaintext on-chain.
- TypeScript port for browser + Node 22 + Solana.
- Used in production for Solana x402 AI agent payment settlement โ dna-x402.
ยฉ 2026 Parad0x Labs โ MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file liquefy-0.1.0.tar.gz.
File metadata
- Download URL: liquefy-0.1.0.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b5dddd18d3ca0c6584af1fd3af266a365353f80045e25df909f51eb8e9d32aa
|
|
| MD5 |
00883d9f7451105048213c79a73ec1fc
|
|
| BLAKE2b-256 |
49b35e809521d98c0874468eed91d7da49576c0d11a05e208aafc82d6ae10d3d
|
File details
Details for the file liquefy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: liquefy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c562aa1a330cc3682973da22adad37cda6df4e9fa898db391e98b1d4928b6973
|
|
| MD5 |
2cd64fa0b2e18e7648414c329e4baf79
|
|
| BLAKE2b-256 |
b9d82c5ed4333787b02a74bf2a8b083b9bb379a6ec5e2ce9ef6c7ad5b7868521
|