ESLint for Apache Spark jobs — analyze event logs, diagnose performance issues

These details have not been verified by PyPI

Project description

Ignis

ESLint for Apache Spark jobs. Point it at an event log and get actionable diagnostics for data skew, shuffle size, spill, and bad partitioning.

$ ignis analyze /path/to/spark-event-log

──────────────────── ignis  my-spark-app ────────────────────

2 issue(s) found

  Severity   Rule              Stage  Message
 ────────────────────────────────────────────────────────────
  WARNING    data-skew             2  Stage 2 ('groupBy at job.py:42'):
                                      max task 42,300ms vs median 1,800ms (23.5x ratio)
  WARNING    partition-count       3  Stage 3 ('join at job.py:71'):
                                      2 shuffle partition(s) across 8 executor core(s)
                                      — cluster is under-utilized

╭───────────────────── data-skew — Stage 2 ──────────────────╮
│ Repartition before the shuffle with a higher partition     │
│ count, or salt the join/groupBy key to spread work across  │
│ more tasks.                                                │
╰────────────────────────────────────────────────────────────╯
╭─────────────────── partition-count — Stage 3 ──────────────╮
│ Raise spark.sql.shuffle.partitions to at least 16          │
│ (2× your 8 executor cores).                                │
╰────────────────────────────────────────────────────────────╯

Installation

pip install spark-ignis            # core only
pip install "spark-ignis[s3]"      # + AWS S3
pip install "spark-ignis[gcs]"     # + Google Cloud Storage
pip install "spark-ignis[azure]"   # + Azure Data Lake Storage

Or install from source:

git clone https://github.com/skatz1990/ignis
cd ignis
python3 -m venv .venv && source .venv/bin/activate
pip install -e .               # local files only
pip install -e ".[s3]"         # + AWS S3
pip install -e ".[gcs]"        # + Google Cloud Storage
pip install -e ".[azure]"      # + Azure Data Lake Storage

Usage

# Analyze a local event log (terminal output, exits 1 if issues found)
ignis analyze /path/to/spark-event-log

# Analyze directly from cloud storage
ignis analyze s3://my-bucket/spark-logs/application_1234_0001
ignis analyze gs://my-bucket/spark-logs/application_1234_0001
ignis analyze abfs://my-container/spark-logs/application_1234_0001

# Machine-readable JSON output — pipe to jq, store in CI artifacts
ignis analyze s3://my-bucket/spark-logs/application_1234_0001 --output json

# List all rules with their thresholds
ignis rules

Exits 0 if no issues are found, 1 if any are — in both terminal and JSON modes.

Spark event logs are standard NDJSON files (Spark 3.x) or zstd-compressed directories (Spark 4.0+). Databricks writes them to DBFS, S3, GCS, or ADLS after each job.

Cloud storage

AWS S3

pip install -e ".[s3]"
ignis analyze s3://my-bucket/spark-logs/application_1234_0001

Credentials from the standard AWS chain:

Source	How
Environment variables	`AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY`
Named profile	`AWS_PROFILE=my-profile ignis analyze s3://...`
Instance role (EC2/ECS)	No configuration needed
SSO	`aws sso login` then run ignis normally

Google Cloud Storage

pip install -e ".[gcs]"
ignis analyze gs://my-bucket/spark-logs/application_1234_0001

Credentials from the standard GCP chain:

Source	How
User credentials	`gcloud auth application-default login`
Service account key	`GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json`
Workload Identity (GKE)	No configuration needed

Azure Data Lake Storage (ADLS Gen2)

pip install -e ".[azure]"
ignis analyze abfs://my-container/spark-logs/application_1234_0001

Credentials from the standard Azure chain:

Source	How
Service principal	`AZURE_TENANT_ID` + `AZURE_CLIENT_ID` + `AZURE_CLIENT_SECRET`
Azure CLI	`az login` then run ignis normally
Managed identity	No configuration needed

Rules

Rule	What it detects	Default threshold
`data-skew`	One task takes far longer than its peers in a shuffle stage	max ≥ 5× median task duration
`shuffle-size`	A stage writes an excessive amount of data to shuffle files	total shuffle write ≥ 1 GB
`spill`	Tasks spill execution data to disk or show significant memory pressure	any disk spill (WARNING); memory spill ≥ 500 MB (INFO)
`partition-count`	Shuffle partition count leaves the cluster idle or overwhelms the driver	< 2× executor cores or > 10,000 partitions
`failed-tasks`	High rate of task failures or speculative task launches in a stage	failure rate ≥ 10% (WARNING); speculation rate ≥ 25% (INFO)
`gc-pressure`	JVM garbage collection consumes a large fraction of executor run time	GC time ≥ 10% of executor run time (WARNING)

Run ignis rules for a live summary with thresholds.

JSON output

--output json emits a structured document to stdout:

{
  "app_id": "application_1234_0001",
  "app_name": "my-spark-app",
  "finding_count": 1,
  "findings": [
    {
      "rule": "data-skew",
      "severity": "warning",
      "stage_id": 2,
      "stage_name": "groupBy at job.py:42",
      "message": "Stage 2 ('groupBy at job.py:42'): max task 42,300ms vs median 1,800ms (23.5x ratio)",
      "recommendation": "Repartition before the shuffle with a higher partition count, or salt the join/groupBy key to spread work across more tasks."
    }
  ]
}

Development

git clone https://github.com/skatz1990/ignis
cd ignis
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

Project layout

ignis/
  parser/     NDJSON event log parsing → Application/Stage/Task models
  rules/      Diagnostic rules (one module per rule)
  reporter/   Terminal (rich) and JSON output
  cli.py      Entry point — ignis analyze <path>, ignis rules
tests/
  fixtures/   Hand-crafted NDJSON snippets that trigger each rule
docs/
  rules.md    Detailed explanation of each rule and its detection logic

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.1

Apr 26, 2026

0.3.0

Apr 26, 2026

This version

0.2.2

Apr 25, 2026

0.2.1

Apr 25, 2026

0.2.0

Apr 25, 2026

0.1.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_ignis-0.2.2.tar.gz (89.6 kB view details)

Uploaded Apr 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spark_ignis-0.2.2-py3-none-any.whl (18.1 kB view details)

Uploaded Apr 25, 2026 Python 3

File details

Details for the file spark_ignis-0.2.2.tar.gz.

File metadata

Download URL: spark_ignis-0.2.2.tar.gz
Upload date: Apr 25, 2026
Size: 89.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spark_ignis-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`3d2fd8923be164950beee0f38e12d539f97fa289e64b521e7860063122f3136d`
MD5	`4b353e383736566753b9cd1f49a47877`
BLAKE2b-256	`5cb0424eae69f8d800af042d09ca8389bca5517a6d68853a4a2bef73d9fc9d31`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark_ignis-0.2.2.tar.gz:

Publisher: publish.yml on skatz1990/ignis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spark_ignis-0.2.2.tar.gz
- Subject digest: 3d2fd8923be164950beee0f38e12d539f97fa289e64b521e7860063122f3136d
- Sigstore transparency entry: 1385202831
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: skatz1990/ignis@ed0cbc217b52fcc653f05fd5d5c603f5ef116ebb
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/skatz1990
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ed0cbc217b52fcc653f05fd5d5c603f5ef116ebb
- Trigger Event: push

File details

Details for the file spark_ignis-0.2.2-py3-none-any.whl.

File metadata

Download URL: spark_ignis-0.2.2-py3-none-any.whl
Upload date: Apr 25, 2026
Size: 18.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spark_ignis-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6db2b81a747e7e4374f903819f07c36d3375e09e6da07cc642963bc7bf89891e`
MD5	`848d7eb03d433d4c7de180c1ddcbfc07`
BLAKE2b-256	`b1b502822d6e8517161f9d177d8255b8badd3cb4d6519e599016c5aa5a4d8000`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spark_ignis-0.2.2-py3-none-any.whl:

Publisher: publish.yml on skatz1990/ignis

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spark_ignis-0.2.2-py3-none-any.whl
- Subject digest: 6db2b81a747e7e4374f903819f07c36d3375e09e6da07cc642963bc7bf89891e
- Sigstore transparency entry: 1385202923
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: skatz1990/ignis@ed0cbc217b52fcc653f05fd5d5c603f5ef116ebb
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/skatz1990
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ed0cbc217b52fcc653f05fd5d5c603f5ef116ebb
- Trigger Event: push

spark-ignis 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Ignis

Installation

Usage

Cloud storage

AWS S3

Google Cloud Storage

Azure Data Lake Storage (ADLS Gen2)

Rules

JSON output

Development

Project layout

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance