Pre-commit hooks for Apache Spark development (Databricks, EMR, Dataproc, and more)
Project description
SparkGrep
Pre-commit hook that detects debugging leftovers in Apache Spark applications.
๐ฏ Purpose
SparkGrep helps maintain clean Apache Spark codebases by detecting common debugging leftovers and performance anti-patterns that developers often forget to remove before committing code.
๐ What it Detects
display()calls - Jupyter/Databricks debugging function.show()methods - DataFrame inspection calls.collect()without assignment - Potential performance issues.count()without assignment - Unnecessary computations- Custom patterns - User-defined patterns via configuration
๐ Installation
pip install sparkgrep
๐ Usage
As a Pre-commit Hook
Add to your .pre-commit-config.yaml:
repos:
- repo: https://github.com/leandroasaservice/sparkgrep
rev: v0.1.0a1 # Use this preview version.
hooks:
- id: sparkgrep
Command Line
# Check specific files
sparkgrep src/my_script.py notebook.ipynb
# Check with additional patterns
sparkgrep --additional-patterns "debug_print:Debug print statement" src/
# Disable default patterns and use only custom ones
sparkgrep --disable-default-patterns --additional-patterns "my_pattern:My description" src/
๐ก๏ธ Security & Quality
This project maintains high security and code quality standards:
๐ Security Measures
- Daily security scans with Bandit, Safety, and GitGuardian
- Automated vulnerability detection and issue creation
- Admin-protected CI/CD pipelines
- Dependency vulnerability monitoring
๐ Code Quality
- 80% minimum code coverage enforced in CI
- SonarCloud integration for continuous code quality analysis
- Automated testing on every PR
- Code formatting with Ruff
๐ Project Structure
sparkgrep/
โโโ src/sparkgrep/ # Main package
โ โโโ cli.py # Command-line interface
โ โโโ patterns.py # Pattern definitions
โ โโโ file_processors.py # File processing logic
โ โโโ utils.py # Utility functions
โโโ tests/ # Test suite
โ โโโ unit/ # Unit tests
โ โโโ integration/ # Integration tests
โโโ .github/ # GitHub configuration
โ โโโ workflows/ # CI/CD pipelines
โ โโโ ISSUE_TEMPLATE/ # Issue templates
โโโ docs/ # Documentation
๐ค Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with tests
- Ensure all checks pass (
task test, security scans) - Submit a pull request
Contribution Guidelines
- Tests required for all new features
- Security scans must pass
- Code coverage must remain โฅ 80%
- Admin approval required for all PRs to main
- Follow existing code style and patterns See CONTRIBUTING.md for details.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Project Docs
Made with โค๏ธ for the Apache Spark community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sparkgrep-0.1.0a1.tar.gz.
File metadata
- Download URL: sparkgrep-0.1.0a1.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86635a0c8835a66b1d722a32c5868f3e927c61d549e341a763c7eb2f995db015
|
|
| MD5 |
a3563f3808836a22139374587fb04970
|
|
| BLAKE2b-256 |
935a0d86312aba33784d5aaae2da46784f8867af3539743b3adf8c5e5ce368da
|
File details
Details for the file sparkgrep-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: sparkgrep-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2c9fdbefd8cb3014684eec1d402588440d32409213806cbf599baac1dac265c
|
|
| MD5 |
770614075e16e590948f959dda923ebe
|
|
| BLAKE2b-256 |
0aa7f8750318630bdbd15fb830c49ff499ae68a280be8b0ae5eb8a5046d5e8a5
|