Skip to main content

Pre-commit hooks for Apache Spark development (Databricks, EMR, Dataproc, and more)

Project description

SparkGrep

Static Badge Lines of Code Quality Gate Status Maintainability Rating Security Rating Reliability Rating Coverage Bugs Vulnerabilities Code Smells Python Version Code style: Ruff Security: Bandit License

Pre-commit hook that detects debugging leftovers in Apache Spark applications.

๐ŸŽฏ Purpose

SparkGrep helps maintain clean Apache Spark codebases by detecting common debugging leftovers and performance anti-patterns that developers often forget to remove before committing code.

๐Ÿ” What it Detects

  • display() calls - Jupyter/Databricks debugging function
  • .show() methods - DataFrame inspection calls
  • .collect() without assignment - Potential performance issues
  • .count() without assignment - Unnecessary computations
  • Custom patterns - User-defined patterns via configuration

๐Ÿš€ Installation

pip install sparkgrep

๐Ÿ“‹ Usage

As a Pre-commit Hook

Add to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/leandroasaservice/sparkgrep
    rev: v0.1.1a1  # Use this preview version.
    hooks:
      - id: sparkgrep

Command Line

# Check specific files
sparkgrep src/my_script.py notebook.ipynb

# Check with additional patterns
sparkgrep --additional-patterns "debug_print:Debug print statement" src/

# Disable default patterns and use only custom ones
sparkgrep --disable-default-patterns --additional-patterns "my_pattern:My description" src/

๐Ÿ›ก๏ธ Security & Quality

This project maintains high security and code quality standards:

๐Ÿ”’ Security Measures

  • Automated vulnerability detection and issue creation
  • Admin-protected CI/CD pipelines
  • Dependency vulnerability monitoring

๐Ÿ“Š Code Quality

  • 80% minimum code coverage enforced in CI
  • SonarCloud integration for continuous code quality analysis
  • Automated testing on every PR
  • Code formatting with Ruff

๐Ÿ“ Project Structure

sparkgrep/
โ”œโ”€โ”€ src/sparkgrep/          # Main package
โ”‚   โ”œโ”€โ”€ cli.py              # Command-line interface
โ”‚   โ”œโ”€โ”€ patterns.py         # Pattern definitions
โ”‚   โ”œโ”€โ”€ file_processors.py  # File processing logic
โ”‚   โ””โ”€โ”€ utils.py            # Utility functions
โ”œโ”€โ”€ tests/                  # Test suite
โ”‚   โ”œโ”€โ”€ unit/               # Unit tests
โ”‚   โ””โ”€โ”€ integration/        # Integration tests
โ”œโ”€โ”€ .github/                # GitHub configuration
โ”‚   โ”œโ”€โ”€ workflows/          # CI/CD pipelines
โ”‚   โ””โ”€โ”€ ISSUE_TEMPLATE/     # Issue templates
โ””โ”€โ”€ docs/                   # Documentation

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes with tests
  4. Ensure all checks pass (task quality, task test)
  5. Submit a pull request

Contribution Guidelines

  • Tests required for all new features
  • Security scans must pass
  • Code coverage must remain โ‰ฅ 80%
  • Admin approval required for all PRs to main
  • Follow existing code style and patterns See CONTRIBUTING.md for details.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ“ž Support


Made with โค๏ธ for the Apache Spark community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkgrep-0.1.1a1.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparkgrep-0.1.1a1-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file sparkgrep-0.1.1a1.tar.gz.

File metadata

  • Download URL: sparkgrep-0.1.1a1.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparkgrep-0.1.1a1.tar.gz
Algorithm Hash digest
SHA256 3d30cea61b3e315b1eef82f0ebf13b3ef676689bd52360f670e58004ef425473
MD5 4e3c258c51b247a3e3fb754dff21281e
BLAKE2b-256 72deaa525ba89fdfa4b26460ffca69068dbb7cd39b967044e01024b82a9bce19

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparkgrep-0.1.1a1.tar.gz:

Publisher: cicd.yml on leandroasaservice/sparkgrep

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sparkgrep-0.1.1a1-py3-none-any.whl.

File metadata

  • Download URL: sparkgrep-0.1.1a1-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparkgrep-0.1.1a1-py3-none-any.whl
Algorithm Hash digest
SHA256 fb363d714c7883e4e6c344b973b83905164f401a18f7c670954deaf91baf00be
MD5 7f62ebbf6dda0472df4b05621210945c
BLAKE2b-256 84e925d6bc4400d372013c12cc5b098778c98d762dd4edab0ee19a32d3b9f650

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparkgrep-0.1.1a1-py3-none-any.whl:

Publisher: cicd.yml on leandroasaservice/sparkgrep

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page