Skip to main content

The best standalone JSON/NDJSON compressor. Beats zstd and brotli on every file.

Project description

datacortex

Python bindings for DataCortex, a lossless JSON/NDJSON compressor that beats zstd-19 and brotli-11 on every file tested.

Built with Rust via PyO3. Native performance, Python convenience.

Install

pip install datacortex

Requires Python 3.8+. Pre-built wheels available for macOS (ARM).

Quick Start

import datacortex

# Compress JSON bytes
with open("logs.ndjson", "rb") as f:
    data = f.read()

compressed = datacortex.compress(data)
print(f"Ratio: {len(data) / len(compressed):.1f}x")

# Turbo mode: ~30x faster encode, ~2% ratio tradeoff
fast = datacortex.compress(data, turbo=True)

# Decompress (byte-exact)
original = datacortex.decompress(compressed)
assert original == data

API Reference

compress(data, mode="fast", format="auto", level=None, turbo=False)

Compress bytes. Returns compressed bytes in .dcx format.

Args:

  • data (bytes): Input data (JSON, NDJSON, or generic text)
  • mode (str): "fast" (default), "balanced", or "max"
  • format (str): "auto" (default), "json", "ndjson", "generic"
  • level (int, optional): zstd level override (fast mode only)
  • turbo (bool): Use turbo mode for ~30x faster encode (fast mode only)

Returns: bytes

decompress(data)

Decompress .dcx bytes. Returns the original data, byte-exact.

Args:

  • data (bytes): Compressed .dcx data

Returns: bytes

compress_file(input_path, output_path, mode="fast", level=None, turbo=False)

Compress a file to .dcx format.

Args:

  • input_path (str): Path to the input file
  • output_path (str): Path for the compressed output
  • mode (str): "fast", "balanced", or "max"
  • level (int, optional): zstd level override (fast mode only)
  • turbo (bool): Use turbo mode for ~30x faster encode (fast mode only)

decompress_file(input_path, output_path)

Decompress a .dcx file back to the original.

Args:

  • input_path (str): Path to the .dcx file
  • output_path (str): Path for the decompressed output

detect_format(data)

Detect the format of input data.

Args:

  • data (bytes): Input data to analyze

Returns: str -- "ndjson", "json", "json_array", or "generic"

info(data)

Inspect compressed .dcx data.

Args:

  • data (bytes): Compressed .dcx data

Returns: dict with keys: mode, format, original_size, compressed_size, crc32, entropy_coder

Compression Modes

Mode Engine Speed Best for
"fast" Columnar + typed encoding + zstd/brotli 2.7 MB/s encode Best ratio on JSON/NDJSON
"fast" + turbo=True Columnar + typed encoding + zstd-3 99 MB/s encode Speed-sensitive pipelines
"balanced" Context mixing engine <1 MB/s General text
"max" CM with larger context maps <1 MB/s Maximum compression

Benchmarks (v0.6.0)

File Size DataCortex zstd -19 vs zstd Turbo Encode
k8s structured logs 9.9 MB ~40x 18.9x +113% --
NDJSON 10K rows 3.3 MB 27.9x 16.0x +68% 68 MB/s
GH Archive 10 MB 8.0x 6.3x +26% 169 MB/s
Twitter API 617 KB 19.7x 14.7x +34% 87 MB/s
Event tickets 1.7 MB 221.7x 189.8x +17% 36 MB/s

CLI

For command-line usage, install the Rust CLI:

cargo install datacortex-cli

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacortex-0.6.0.tar.gz (176.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datacortex-0.6.0-cp39-cp39-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file datacortex-0.6.0.tar.gz.

File metadata

  • Download URL: datacortex-0.6.0.tar.gz
  • Upload date:
  • Size: 176.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for datacortex-0.6.0.tar.gz
Algorithm Hash digest
SHA256 2d722c53b9ce543b39611196354e803d468faa89fecede41ef88c13c67467a3b
MD5 5988e4d532de8c61924dc8559556cd20
BLAKE2b-256 f61f3fdfab06eaad92313f8e32fd88dd66988e648b974a59fa84f0e2d7784b40

See more details on using hashes here.

File details

Details for the file datacortex-0.6.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for datacortex-0.6.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c6b0375322605bac6b99158516eda644900477ad64b9ad94ca583cc167ba1810
MD5 ab696c3cc262cdbaacdc0b890d5716a8
BLAKE2b-256 0b720f5ba35f9323355e6f29677f6c90b147b43221aa95be3f091f2fcfdf3b79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page