Skip to main content

Extract events from Splunk journal archives to raw format (JSON, CSV, Parquet)

Project description

Splunk DDSS Extractor

Convert Splunk self-hosted storage archives from compressed journal format to raw format.

Overview

Splunk DDSS Extractor is a Python library (with a Rust native extension for performance) that processes Splunk journal archives, extracts events, and converts them to common formats for analysis and long-term storage.

Key highlights:

  • Rust-accelerated decoding via PyO3 native extension (~2.7x faster than pure Python)
  • Automatic compression detection (.zst, .gz, uncompressed)
  • Multiple output formats: JSON Lines, CSV, Parquet
  • Streaming S3 support (read/write directly, no temp files)
  • CLI tool and Python API

Based on the concept from fionera/splunker, extended with streaming S3 support, multiple output formats, Rust performance, and production-ready features.

Installation

pip install splunk-ddss-extractor

With optional extras:

pip install splunk-ddss-extractor[s3]       # S3 streaming support (boto3)
pip install splunk-ddss-extractor[parquet]   # Parquet output (pyarrow)
pip install splunk-ddss-extractor[cli]       # CLI dependencies (click)

Pre-built wheels with the native Rust extension are available for Linux x86_64 and aarch64 (Python 3.10-3.13). The library falls back to a pure-Python decoder if the native extension is unavailable.

Quick Start

CLI

# Extract to JSON Lines
splunk-extract -i journal.zst -o output.json -f ndjson

# Extract to CSV
splunk-extract -i journal.zst -o output.csv -f csv

# Extract to Parquet
splunk-extract -i journal.zst -o output.parquet -f parquet

# S3 to local (streaming)
splunk-extract -i s3://bucket/path/journal.zst -o output.json

# S3 to S3 (streaming, no temp files)
splunk-extract -i s3://bucket/input/journal.zst -o s3://bucket/output/data.json

# Stdin/stdout
cat journal.zst | splunk-extract > output.json

# Enable debug tracing
splunk-extract -i journal.zst -o output.json --trace

Python API

from splunk_ddss_extractor.extractor import Extractor

extractor = Extractor()

# All compression formats are auto-detected
extractor.extract("journal.zst", "output.json", "ndjson")    # Zstandard
extractor.extract("journal.gz", "output.csv", "csv")          # Gzip
extractor.extract("journal", "output.parquet", "parquet")      # Uncompressed

# Streaming S3 (no temp files)
extractor.extract("s3://bucket/journal.zst", "s3://bucket/output.json", "ndjson")

Low-level Decoder (Advanced)

The native decoder provides direct access to the binary journal format:

from splunk_ddss_extractor import NativeJournalDecoder
import zstandard as zstd

with open("journal.zst", "rb") as f:
    dctx = zstd.ZstdDecompressor()
    with dctx.stream_reader(f) as reader:
        decoder = NativeJournalDecoder(reader=reader)
        while decoder.scan():
            event = decoder.get_event()
            print(f"Host: {decoder.host()}")
            print(f"Source: {decoder.source()}")
            print(f"Sourcetype: {decoder.source_type()}")
            print(f"Timestamp: {event.index_time}")
            print(f"Message: {event.message_string()}")

A pure-Python JournalDecoder is also available with the same interface, useful when the native extension cannot be built.

Architecture

splunk_ddss_extractor/
├── src/splunk_ddss_extractor/    # Python package
│   ├── __init__.py
│   ├── decoder.py                # Pure-Python journal decoder
│   ├── native_decoder.py         # Rust-backed decoder (NativeJournalDecoder)
│   ├── extractor.py              # High-level extraction API
│   ├── main.py                   # CLI entry point
│   └── utils.py                  # Output writers, logging
├── rust/                         # Native extension (PyO3 + maturin)
│   ├── Cargo.toml
│   └── src/
│       ├── lib.rs                # PyO3 module
│       ├── decoder.rs            # Rust journal decoder
│       └── varint.rs             # Variable-length integer parsing
├── tests/
├── docker/
└── scripts/

Components:

  1. Native Decoder (rust/ + native_decoder.py) - Rust extension for high-throughput decoding via scan_batch, wrapped in Python for ergonomic use
  2. Python Decoder (decoder.py) - Pure-Python fallback with identical interface
  3. Extractor (extractor.py) - High-level API with auto-compression, S3 streaming, multi-format output
  4. CLI (main.py) - Command-line interface via splunk-extract

Output Formats

Format Extension Use Case
JSON Lines .json / .jsonl Streaming, log aggregation
CSV .csv Spreadsheets, simple analytics
Parquet .parquet Columnar analytics, data lakes

Example JSON Lines output:

{"timestamp": 1761643257, "host": "server01", "source": "s3://logs/app.log", "sourcetype": "aws:cloudtrail", "message": "{\"eventVersion\":\"1.08\",...}"}

Integration Examples

AWS Lambda (streaming S3-to-S3):

from splunk_ddss_extractor.extractor import Extractor

def lambda_handler(event, context):
    extractor = Extractor()
    count = extractor.extract(
        input_path="s3://input-bucket/journal.zst",
        output_path="s3://output-bucket/output.json",
        output_format="ndjson",
    )
    return {"statusCode": 200, "events_extracted": count}

Docker:

make docker
docker run -v /path/to/data:/data splunk-ddss-extractor:latest \
    splunk-extract -i /data/journal.zst -o /data/output.json

Development

Setup

# Full setup (venv + deps + Rust extension)
make dev-setup

# Or step by step:
make venv
source venv/bin/activate
make install
make rust-build-release

Commands

make test              # Run Python tests
make test-coverage     # Tests with coverage report
make rust-test         # Run Rust unit tests
make rust-build        # Build Rust extension (debug)
make rust-build-release # Build Rust extension (release)
make docker            # Build Docker image
make check             # Run all checks
make env               # Show all available commands

Versioning and Releases

Version is tracked in three files kept in sync by the bump script:

  • pyproject.toml (Python package version)
  • rust/Cargo.toml (Rust crate version)
  • src/splunk_ddss_extractor/__init__.py (runtime __version__)
make version           # Show current version
make bump-patch        # 0.3.0 -> 0.3.1
make bump-minor        # 0.3.0 -> 0.4.0
make bump-major        # 0.3.0 -> 1.0.0
make release           # Bump patch, commit, tag, and push
make release-minor     # Bump minor, commit, tag, and push
make release-major     # Bump major, commit, tag, and push

The publish.yml GitHub Actions workflow builds native wheels for Linux x86_64 and aarch64 (Python 3.10-3.13) and publishes to PyPI when a GitHub release is created.

Performance

Decoder Throughput
Pure Python (JournalDecoder) ~29K events/s
Rust native (NativeJournalDecoder) ~80K events/s

The Extractor class uses the native decoder by default when available.

Credits

Based on the concept from fionera/splunker (Go). This Python/Rust implementation adds streaming S3 support, multiple output formats, native performance, and a production-ready API.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splunk_ddss_extractor-0.4.3.tar.gz (21.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

splunk_ddss_extractor-0.4.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (255.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

splunk_ddss_extractor-0.4.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (253.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

splunk_ddss_extractor-0.4.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (255.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

splunk_ddss_extractor-0.4.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (253.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

splunk_ddss_extractor-0.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (256.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

splunk_ddss_extractor-0.4.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (255.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

splunk_ddss_extractor-0.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (256.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

splunk_ddss_extractor-0.4.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (255.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

File details

Details for the file splunk_ddss_extractor-0.4.3.tar.gz.

File metadata

  • Download URL: splunk_ddss_extractor-0.4.3.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for splunk_ddss_extractor-0.4.3.tar.gz
Algorithm Hash digest
SHA256 eb2ea295be88fe44f39eee24b72fd51a15791bbf78cd50822343dd1a8f5e242f
MD5 27d2e675a5248acde58c7ed8be6d78a1
BLAKE2b-256 f38c67cc6fa0ece10f8ac7a51e4747da9ef470cc62151dc5117384ca3f71302e

See more details on using hashes here.

Provenance

The following attestation bundles were made for splunk_ddss_extractor-0.4.3.tar.gz:

Publisher: publish.yml on ponquersohn/splunk_ddss_extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file splunk_ddss_extractor-0.4.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for splunk_ddss_extractor-0.4.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 65a043a150208d07baf5fbf5eb603accd37ff3c710a906580eb4f23a42092345
MD5 8c7ceeab02f47aa0f62f154be18c6567
BLAKE2b-256 85011e6b683410567401ffee642e2c36038cd2002df57ebb7fe0959af94500ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for splunk_ddss_extractor-0.4.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on ponquersohn/splunk_ddss_extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file splunk_ddss_extractor-0.4.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for splunk_ddss_extractor-0.4.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6cc1f9e36f3cdebdd3c1476746cf16df63017e2dd69541bad396d6c20c6437bd
MD5 ef5d524f2d4192476acfa212496347d1
BLAKE2b-256 28a8ff8c9e71848a4b1c04f021049e2be6dc97a2a89c4095c64c2743381d3a24

See more details on using hashes here.

Provenance

The following attestation bundles were made for splunk_ddss_extractor-0.4.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on ponquersohn/splunk_ddss_extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file splunk_ddss_extractor-0.4.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for splunk_ddss_extractor-0.4.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5fd64071f6dc7792eaf74f8fbd5735d19cc6bf591bcc3c3ce7e7e931eebf7776
MD5 c27cbbd5d9909442fe57300647c67aee
BLAKE2b-256 93f68fbfd20864c622a8e1fceef115d9a1251c19ae5a953d8ae33456aadca2e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for splunk_ddss_extractor-0.4.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on ponquersohn/splunk_ddss_extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file splunk_ddss_extractor-0.4.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for splunk_ddss_extractor-0.4.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d7d7bd5e76355bd80adc0ce8d80c21b8142294d68fcd5a324349e0c5d669e8f0
MD5 09c187f1394973ddc523071761e52fb0
BLAKE2b-256 5ae080af76272045b1696f8579eb49a669c648bb0e95b398d93fd2f66c0f7239

See more details on using hashes here.

Provenance

The following attestation bundles were made for splunk_ddss_extractor-0.4.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on ponquersohn/splunk_ddss_extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file splunk_ddss_extractor-0.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for splunk_ddss_extractor-0.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6b92e8430d243898d1c8daeb8c314032e3d7c80b2cf07aad90c392b6ee9fbfe9
MD5 61cfac2c40e06fe6328a6991497378fe
BLAKE2b-256 87d830908d092491121974c7a340168f7ba398aea710f8a1ad9172c14bd0629b

See more details on using hashes here.

Provenance

The following attestation bundles were made for splunk_ddss_extractor-0.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on ponquersohn/splunk_ddss_extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file splunk_ddss_extractor-0.4.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for splunk_ddss_extractor-0.4.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3650ac5fd80f7461a279a7b045139ba78760642bd47c7e2ccc425514417ca38d
MD5 787cbbbe207d548c48db763551c79b16
BLAKE2b-256 1dfdde2ae9f9dee90cee979588bbc806e91fdd0f9e6c725f4079489f503ec514

See more details on using hashes here.

Provenance

The following attestation bundles were made for splunk_ddss_extractor-0.4.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on ponquersohn/splunk_ddss_extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file splunk_ddss_extractor-0.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for splunk_ddss_extractor-0.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ecd13bacccfc136efa8e4a50a2ba837be7e97500acd238908611b0aae1314a61
MD5 f05025ad2112873dad642a1f53b1a4e3
BLAKE2b-256 a85b18aa586eb15fe0d0b8113431dbbbde2e2cc7018ad3d322f0b61c574a9c21

See more details on using hashes here.

Provenance

The following attestation bundles were made for splunk_ddss_extractor-0.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on ponquersohn/splunk_ddss_extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file splunk_ddss_extractor-0.4.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for splunk_ddss_extractor-0.4.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 901f1caf00b310ca76057a4d0d521c4678f04e80a7ec0c5b29bc70bcc852e7e0
MD5 dc1f794b3e5e549f9472d8b040cf36db
BLAKE2b-256 d0aecbd4238762d3baaa806c2e14a9855cbc66170f61b39ccf2aaa135b9e6ebf

See more details on using hashes here.

Provenance

The following attestation bundles were made for splunk_ddss_extractor-0.4.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on ponquersohn/splunk_ddss_extractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page