Skip to main content

Deterministic rules engine for Spark job analysis

Project description

spark-advisor-rules

Deterministic rules engine for Apache Spark job analysis. Part of the spark-advisor ecosystem.

Install

pip install spark-advisor-rules

What it detects

11 rules that identify common Spark performance problems:

Rule Detects
DataSkewRule Task duration skew (max/median > 5x)
SpillToDiskRule Disk spill indicating insufficient memory
GCPressureRule GC time > 20% of task time
ShufflePartitionsRule Partition count far from optimal (128MB target)
ExecutorIdleRule Slot utilization < 40%
TaskFailureRule Failed tasks (OOM, fetch failures)
SmallFileRule Avg input bytes per task < 10MB
BroadcastJoinThresholdRule Broadcast join disabled or too low
SerializerChoiceRule Java serializer used with shuffle stages
DynamicAllocationRule Missing min/max bounds or disabled
ExecutorMemoryOverheadRule High GC + high memory utilization

All thresholds are configurable via Thresholds model.

Usage

from spark_advisor_rules.static_analysis import StaticAnalysisService

service = StaticAnalysisService()
results = service.analyze(job_analysis)

Links

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark_advisor_rules-0.1.6.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spark_advisor_rules-0.1.6-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file spark_advisor_rules-0.1.6.tar.gz.

File metadata

  • Download URL: spark_advisor_rules-0.1.6.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for spark_advisor_rules-0.1.6.tar.gz
Algorithm Hash digest
SHA256 5cf34a97b052781f2285ee37146f52a3795c818fb529d529b6c2448ac500a740
MD5 9c5aea60507fd1642a77e8020b3ba5ee
BLAKE2b-256 a12ffcfad756ead69bff3c7cbabc63aea0d23dfd671a57bc68e2250bc9450037

See more details on using hashes here.

File details

Details for the file spark_advisor_rules-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: spark_advisor_rules-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for spark_advisor_rules-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 39431a6fc4f572d176bb3a94bac00309a93cb8adb3a27f679738ce5e96d6f6b6
MD5 08211596a77e1e15d0a7b136eff31e47
BLAKE2b-256 25977cb10cec2bcd408ad413df4a08afead6a1d4a97282368d5e8bb23e19b7cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page