The best standalone JSON/NDJSON compressor. Beats zstd and brotli on every file.
Project description
datacortex
Python bindings for DataCortex, a lossless JSON/NDJSON compressor that beats zstd-19 and brotli-11 on every file tested.
Built with Rust via PyO3. Native performance, Python convenience.
Install
pip install datacortex
Requires Python 3.8+. Pre-built wheels available for macOS (ARM).
Quick Start
import datacortex
# Compress JSON bytes
with open("logs.ndjson", "rb") as f:
data = f.read()
compressed = datacortex.compress(data)
print(f"Ratio: {len(data) / len(compressed):.1f}x")
# Turbo mode: ~30x faster encode, ~2% ratio tradeoff
fast = datacortex.compress(data, turbo=True)
# Decompress (byte-exact)
original = datacortex.decompress(compressed)
assert original == data
API Reference
compress(data, mode="fast", format="auto", level=None, turbo=False)
Compress bytes. Returns compressed bytes in .dcx format.
Args:
data(bytes): Input data (JSON, NDJSON, or generic text)mode(str):"fast"(default),"balanced", or"max"format(str):"auto"(default),"json","ndjson","generic"level(int, optional): zstd level override (fast mode only)turbo(bool): Use turbo mode for ~30x faster encode (fast mode only)
Returns: bytes
decompress(data)
Decompress .dcx bytes. Returns the original data, byte-exact.
Args:
data(bytes): Compressed .dcx data
Returns: bytes
compress_file(input_path, output_path, mode="fast", level=None, turbo=False)
Compress a file to .dcx format.
Args:
input_path(str): Path to the input fileoutput_path(str): Path for the compressed outputmode(str):"fast","balanced", or"max"level(int, optional): zstd level override (fast mode only)turbo(bool): Use turbo mode for ~30x faster encode (fast mode only)
decompress_file(input_path, output_path)
Decompress a .dcx file back to the original.
Args:
input_path(str): Path to the .dcx fileoutput_path(str): Path for the decompressed output
detect_format(data)
Detect the format of input data.
Args:
data(bytes): Input data to analyze
Returns: str -- "ndjson", "json", "json_array", or "generic"
info(data)
Inspect compressed .dcx data.
Args:
data(bytes): Compressed .dcx data
Returns: dict with keys: mode, format, original_size, compressed_size, crc32, entropy_coder
Compression Modes
| Mode | Engine | Speed | Best for |
|---|---|---|---|
"fast" |
Columnar + typed encoding + zstd/brotli | 2.7 MB/s encode | Best ratio on JSON/NDJSON |
"fast" + turbo=True |
Columnar + typed encoding + zstd-3 | 99 MB/s encode | Speed-sensitive pipelines |
"balanced" |
Context mixing engine | <1 MB/s | General text |
"max" |
CM with larger context maps | <1 MB/s | Maximum compression |
Benchmarks (v0.6.0)
| File | Size | DataCortex | zstd -19 | vs zstd | Turbo Encode |
|---|---|---|---|---|---|
| k8s structured logs | 9.9 MB | ~40x | 18.9x | +113% | -- |
| NDJSON 10K rows | 3.3 MB | 27.9x | 16.0x | +68% | 68 MB/s |
| GH Archive | 10 MB | 8.0x | 6.3x | +26% | 169 MB/s |
| Twitter API | 617 KB | 19.7x | 14.7x | +34% | 87 MB/s |
| Event tickets | 1.7 MB | 221.7x | 189.8x | +17% | 36 MB/s |
CLI
For command-line usage, install the Rust CLI:
cargo install datacortex-cli
Links
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datacortex-0.6.0.tar.gz.
File metadata
- Download URL: datacortex-0.6.0.tar.gz
- Upload date:
- Size: 176.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d722c53b9ce543b39611196354e803d468faa89fecede41ef88c13c67467a3b
|
|
| MD5 |
5988e4d532de8c61924dc8559556cd20
|
|
| BLAKE2b-256 |
f61f3fdfab06eaad92313f8e32fd88dd66988e648b974a59fa84f0e2d7784b40
|
File details
Details for the file datacortex-0.6.0-cp39-cp39-macosx_11_0_arm64.whl.
File metadata
- Download URL: datacortex-0.6.0-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6b0375322605bac6b99158516eda644900477ad64b9ad94ca583cc167ba1810
|
|
| MD5 |
ab696c3cc262cdbaacdc0b890d5716a8
|
|
| BLAKE2b-256 |
0b720f5ba35f9323355e6f29677f6c90b147b43221aa95be3f091f2fcfdf3b79
|