Skip to main content

ESLint for Apache Spark jobs — analyze event logs, diagnose performance issues

Project description

Ignis

CI PyPI Python License: MIT

ESLint for Apache Spark jobs. Point it at an event log and get actionable diagnostics for data skew, shuffle size, spill, and bad partitioning.

$ ignis analyze /path/to/spark-event-log

──────────────────── ignis  my-spark-app ────────────────────

2 issue(s) found

  Severity   Rule              Stage  Message
 ────────────────────────────────────────────────────────────
  WARNING    data-skew             2  Stage 2 ('groupBy at job.py:42'):
                                      max task 42,300ms vs median 1,800ms (23.5x ratio)
  WARNING    partition-count       3  Stage 3 ('join at job.py:71'):
                                      2 shuffle partition(s) across 8 executor core(s)
                                      — cluster is under-utilized

╭───────────────────── data-skew — Stage 2 ──────────────────╮
│ Repartition before the shuffle with a higher partition     │
│ count, or salt the join/groupBy key to spread work across  │
│ more tasks.                                                │
╰────────────────────────────────────────────────────────────╯
╭─────────────────── partition-count — Stage 3 ──────────────╮
│ Raise spark.sql.shuffle.partitions to at least 16          │
│ (2× your 8 executor cores).                                │
╰────────────────────────────────────────────────────────────╯

Installation

pip install spark-ignis            # core only
pip install "spark-ignis[s3]"      # + AWS S3
pip install "spark-ignis[gcs]"     # + Google Cloud Storage
pip install "spark-ignis[azure]"   # + Azure Data Lake Storage

Or install from source:

git clone https://github.com/skatz1990/ignis
cd ignis
python3 -m venv .venv && source .venv/bin/activate
pip install -e .               # local files only
pip install -e ".[s3]"         # + AWS S3
pip install -e ".[gcs]"        # + Google Cloud Storage
pip install -e ".[azure]"      # + Azure Data Lake Storage

Usage

# Analyze a local event log (terminal output, exits 1 if issues found)
ignis analyze /path/to/spark-event-log

# Analyze directly from cloud storage
ignis analyze s3://my-bucket/spark-logs/application_1234_0001
ignis analyze gs://my-bucket/spark-logs/application_1234_0001
ignis analyze abfs://my-container/spark-logs/application_1234_0001

# Machine-readable JSON output — pipe to jq, store in CI artifacts
ignis analyze s3://my-bucket/spark-logs/application_1234_0001 --output json

# Pipe findings directly to a Slack channel
ignis analyze s3://my-bucket/spark-logs/application_1234_0001 --output json \
  | ignis notify slack https://hooks.slack.com/services/...

# Send findings by email
ignis analyze s3://my-bucket/spark-logs/application_1234_0001 --output json \
  | ignis notify email ops@example.com \
      --from ignis@example.com --smtp-host smtp.example.com

# List all rules with their thresholds
ignis rules

Exits 0 if no issues are found, 1 if any are — in both terminal and JSON modes.

Spark event logs are standard NDJSON files (Spark 3.x) or zstd-compressed directories (Spark 4.0+). Databricks writes them to DBFS, S3, GCS, or ADLS after each job.

Notifications

ignis notify reads findings JSON from stdin and routes them to a notification channel. It is silent (exit 0, no message) when there are no findings. Pass --always to send a clean-run confirmation.

Slack

ignis analyze /path/to/spark-event-log --output json \
  | ignis notify slack https://hooks.slack.com/services/YOUR/WEBHOOK/URL

Create an incoming webhook at api.slack.com/apps → your app → Incoming Webhooks.

Email

ignis analyze /path/to/spark-event-log --output json \
  | ignis notify email ops@example.com \
      --from ignis@example.com \
      --smtp-host smtp.example.com \
      --smtp-port 587 \
      --username user \
      --password pass

Sends a plain-text + HTML multipart email via SMTP with STARTTLS. Port 25 skips TLS (useful for local relay). --username and --password are optional if your relay doesn't require authentication.

Cloud storage

AWS S3

pip install -e ".[s3]"
ignis analyze s3://my-bucket/spark-logs/application_1234_0001

Credentials from the standard AWS chain:

Source How
Environment variables AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY
Named profile AWS_PROFILE=my-profile ignis analyze s3://...
Instance role (EC2/ECS) No configuration needed
SSO aws sso login then run ignis normally

Google Cloud Storage

pip install -e ".[gcs]"
ignis analyze gs://my-bucket/spark-logs/application_1234_0001

Credentials from the standard GCP chain:

Source How
User credentials gcloud auth application-default login
Service account key GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
Workload Identity (GKE) No configuration needed

Azure Data Lake Storage (ADLS Gen2)

pip install -e ".[azure]"
ignis analyze abfs://my-container/spark-logs/application_1234_0001

Credentials from the standard Azure chain:

Source How
Service principal AZURE_TENANT_ID + AZURE_CLIENT_ID + AZURE_CLIENT_SECRET
Azure CLI az login then run ignis normally
Managed identity No configuration needed

Rules

Rule What it detects Default threshold
data-skew One task takes far longer than its peers in a shuffle stage max ≥ 5× median task duration
shuffle-size A stage writes an excessive amount of data to shuffle files total shuffle write ≥ 1 GB
spill Tasks spill execution data to disk or show significant memory pressure any disk spill (WARNING); memory spill ≥ 500 MB (INFO)
partition-count Shuffle partition count leaves the cluster idle or overwhelms the driver < 2× executor cores or > 10,000 partitions
failed-tasks High rate of task failures or speculative task launches in a stage failure rate ≥ 10% (WARNING); speculation rate ≥ 25% (INFO)
gc-pressure JVM garbage collection consumes a large fraction of executor run time GC time ≥ 10% of executor run time (WARNING)

Run ignis rules for a live summary with thresholds.

JSON output

--output json emits a structured document to stdout:

{
  "app_id": "application_1234_0001",
  "app_name": "my-spark-app",
  "finding_count": 1,
  "findings": [
    {
      "rule": "data-skew",
      "severity": "warning",
      "stage_id": 2,
      "stage_name": "groupBy at job.py:42",
      "message": "Stage 2 ('groupBy at job.py:42'): max task 42,300ms vs median 1,800ms (23.5x ratio)",
      "recommendation": "Repartition before the shuffle with a higher partition count, or salt the join/groupBy key to spread work across more tasks."
    }
  ]
}

Development

git clone https://github.com/skatz1990/ignis
cd ignis
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

Project layout

ignis/
  parser/     NDJSON event log parsing → Application/Stage/Task models
  rules/      Diagnostic rules (one module per rule)
  reporter/   Terminal (rich) and JSON output
  cli.py      Entry point — ignis analyze <path>, ignis rules
tests/
  fixtures/   Hand-crafted NDJSON snippets that trigger each rule
docs/
  rules.md    Detailed explanation of each rule and its detection logic

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_ignis-0.3.0.tar.gz (96.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spark_ignis-0.3.0-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file spark_ignis-0.3.0.tar.gz.

File metadata

  • Download URL: spark_ignis-0.3.0.tar.gz
  • Upload date:
  • Size: 96.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spark_ignis-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fc96ad887d8686a0324b861467d3be63300fdb417dec2d8c89cdcae80f59aa9e
MD5 35f4e86e1dd97cc7a4bd9122a7a7f9af
BLAKE2b-256 699a22f62fd9bc8657e9d30acfd7b4641a49fd9aa82b850b9da1f420ffed9585

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark_ignis-0.3.0.tar.gz:

Publisher: publish.yml on skatz1990/ignis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spark_ignis-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: spark_ignis-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spark_ignis-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88c375db27dac28e7bc0d0d9017c866165867f20d77af033d26f4cd15e30e1d7
MD5 4d31a730a4bd4562e69d0b05a69b42c8
BLAKE2b-256 18fd3fd2cfeed5798caf324a2bffab12bece3a5cc355688d20ef99b5a9dcc713

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark_ignis-0.3.0-py3-none-any.whl:

Publisher: publish.yml on skatz1990/ignis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page