Skip to main content

Pre-commit hooks for Apache Spark development (Databricks, EMR, Dataproc, and more)

Project description

SparkGrep

Static Badge Lines of Code Quality Gate Status Maintainability Rating Security Rating Reliability Rating Coverage Bugs Vulnerabilities Code Smells Python Version Code style: Ruff Security: Bandit License

Pre-commit hook that detects debugging leftovers in Apache Spark applications.

๐ŸŽฏ Purpose

SparkGrep helps maintain clean Apache Spark codebases by detecting common debugging leftovers and performance anti-patterns that developers often forget to remove before committing code.

๐Ÿ” What it Detects

  • display() calls - Jupyter/Databricks debugging function
  • .show() methods - DataFrame inspection calls
  • .collect() without assignment - Potential performance issues
  • .count() without assignment - Unnecessary computations
  • Custom patterns - User-defined patterns via configuration

๐Ÿš€ Installation

pip install sparkgrep

๐Ÿ“‹ Usage

As a Pre-commit Hook

Add to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/leandroasaservice/sparkgrep
    rev: v0.1.0a1  # Use this preview version.
    hooks:
      - id: sparkgrep

Command Line

# Check specific files
sparkgrep src/my_script.py notebook.ipynb

# Check with additional patterns
sparkgrep --additional-patterns "debug_print:Debug print statement" src/

# Disable default patterns and use only custom ones
sparkgrep --disable-default-patterns --additional-patterns "my_pattern:My description" src/

๐Ÿ›ก๏ธ Security & Quality

This project maintains high security and code quality standards:

๐Ÿ”’ Security Measures

  • Daily security scans with Bandit, Safety, and GitGuardian
  • Automated vulnerability detection and issue creation
  • Admin-protected CI/CD pipelines
  • Dependency vulnerability monitoring

๐Ÿ“Š Code Quality

  • 80% minimum code coverage enforced in CI
  • SonarCloud integration for continuous code quality analysis
  • Automated testing on every PR
  • Code formatting with Ruff

๐Ÿ“ Project Structure

sparkgrep/
โ”œโ”€โ”€ src/sparkgrep/          # Main package
โ”‚   โ”œโ”€โ”€ cli.py              # Command-line interface
โ”‚   โ”œโ”€โ”€ patterns.py         # Pattern definitions
โ”‚   โ”œโ”€โ”€ file_processors.py  # File processing logic
โ”‚   โ””โ”€โ”€ utils.py            # Utility functions
โ”œโ”€โ”€ tests/                  # Test suite
โ”‚   โ”œโ”€โ”€ unit/               # Unit tests
โ”‚   โ””โ”€โ”€ integration/        # Integration tests
โ”œโ”€โ”€ .github/                # GitHub configuration
โ”‚   โ”œโ”€โ”€ workflows/          # CI/CD pipelines
โ”‚   โ””โ”€โ”€ ISSUE_TEMPLATE/     # Issue templates
โ””โ”€โ”€ docs/                   # Documentation

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes with tests
  4. Ensure all checks pass (task test, security scans)
  5. Submit a pull request

Contribution Guidelines

  • Tests required for all new features
  • Security scans must pass
  • Code coverage must remain โ‰ฅ 80%
  • Admin approval required for all PRs to main
  • Follow existing code style and patterns See CONTRIBUTING.md for details.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ“ž Support


Made with โค๏ธ for the Apache Spark community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkgrep-0.1.0a1.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparkgrep-0.1.0a1-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file sparkgrep-0.1.0a1.tar.gz.

File metadata

  • Download URL: sparkgrep-0.1.0a1.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for sparkgrep-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 86635a0c8835a66b1d722a32c5868f3e927c61d549e341a763c7eb2f995db015
MD5 a3563f3808836a22139374587fb04970
BLAKE2b-256 935a0d86312aba33784d5aaae2da46784f8867af3539743b3adf8c5e5ce368da

See more details on using hashes here.

File details

Details for the file sparkgrep-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: sparkgrep-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for sparkgrep-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 a2c9fdbefd8cb3014684eec1d402588440d32409213806cbf599baac1dac265c
MD5 770614075e16e590948f959dda923ebe
BLAKE2b-256 0aa7f8750318630bdbd15fb830c49ff499ae68a280be8b0ae5eb8a5046d5e8a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page