An extremely fast Python linter for Apache Airflow DAG files
Project description
DagRuff
An extremely fast Python linter for Apache Airflow DAG files, written in Python.
DagRuff is a linter designed to catch common errors and enforce best practices in Apache Airflow DAG files. It checks for over 31 rules covering DAG structure, best practices, and Airflow-specific patterns.
Features
- Fast: Built with performance in mind, using AST parsing for static analysis
- Caching: Results are cached based on file hash for improved performance
- Comprehensive: 31+ lint rules covering DAG structure, best practices, and Airflow patterns
- Auto-fix: Automatically fix many common issues with
--fix - Configurable: Configure rules via
pyproject.tomlor.dagruff.tomlwith validation - Plugin Support: Extend functionality with custom rule plugins via entry points
- No Airflow Required: Works without Airflow for AST-based checks (optional DagBag validation requires Airflow)
Installation
# Basic installation (no Airflow, AST checks only)
pip install dagruff
# With Airflow support (recommended for full DagBag validation)
pip install dagruff[airflow]
Or install from source:
git clone https://github.com/dkfancska/dagruff.git
cd dagruff
pip install -e ".[airflow]"
Note: Basic installation works without Airflow and performs all static checks via AST. For DagBag validation (import checking and code execution), install with the airflow extra.
Usage
After installation, use the dagruff command:
# Check a single file
dagruff examples/example_dag_good.py
# Check a directory
dagruff examples/
# Filter by severity
dagruff examples/ --severity warning
# JSON output
dagruff examples/ --format json
# Use configuration file
dagruff --config .dagruff.toml
# Without path - uses paths from config
dagruff
# Auto-fix all fixable issues
dagruff examples/ --fix
# Auto-fix specific rules
dagruff examples/ --fix DAG001 DAG009 AIR003
# Ignore specific rules
dagruff examples/ --ignore DAG006 DAG007
# Disable caching (useful for CI/CD)
dagruff examples/ --no-cache
# Verbose logging
dagruff examples/ --log-level debug
Lint Rules
DagRuff implements 31 lint rules from various sources:
DAG Rules (13 rules)
- DAG import and definition checks
dag_idvalidation and uniqueness- Required DAG parameters (
dag_id,start_date) - Recommended parameters (
dag_md) - Special checks for
KubernetesPodOperator(requirescontainer_resourcesandexecutor_resources)
Ruff AIR Rules (4 rules)
AIR002: Check forstart_datepresenceAIR003: CheckcatchupparameterAIR013: Recommendmax_active_runsAIR014: Recommendmax_active_tasksfor Airflow 2+ (warn about deprecatedconcurrency)
flake8-airflow Rules (4 rules)
AF001: ForbidSubDagOperatorusageAF002: Security warnings forBashOperatorAF003: Checktask_iduniquenessAF004: Detect deprecated operators
airflint AST Rules (4 rules)
AIRFLINT001: Check task dependenciesAIRFLINT002: Check XCom usageAIRFLINT003: Check Variables usageAIRFLINT004: Check required operator parameters
Best Practices Rules (6 rules)
BP001: Check for top-level code avoidanceBP002: Check datetime function usageBP003: Recommendexecution_timeoutfor tasksBP004: Check dependency method consistencyBP005: Recommend docstrings for tasksBP006: Recommenddagrun_timeoutfor DAGs
Full documentation:
- ๐ RULES.md - Complete rule descriptions with examples, quick reference, and grouping
- ๐ PLUGINS.md - Plugin system documentation
- ๐ง CONTRIBUTING.md - Contribution guidelines
- โ PRE_COMMIT.md - Pre-commit hooks setup
Auto-fix (--fix)
DagRuff supports automatic fixing of many issues via the --fix flag:
Fixable Rules:
- DAG001 - Adds
from airflow import DAGimport - DAG005 - Removes extra spaces in
dag_id - DAG009 - Adds
"owner": "airflow"todefault_args - DAG010 - Adds
"retries": 1todefault_args - AIR003 - Adds
catchup=Falseto DAG - AIR013 - Adds
max_active_runs=1to DAG - AIR014 - Replaces
concurrencywithmax_active_tasksor addsmax_active_tasks=1
Usage:
# Fix all fixable issues
dagruff examples/ --fix
# Fix only specific rules
dagruff examples/ --fix DAG001 DAG009
# Combine with other options
dagruff examples/ --fix DAG001 --severity warning
Note: Auto-fix preserves code formatting and checks for duplicates before adding parameters. Uses AST-based approach for more reliable fixes with fallback to regex when needed.
Configuration
DagRuff can be configured via pyproject.toml or .dagruff.toml:
[tool.dagruff]
# Enable/disable specific rules
select = ["DAG001", "DAG002", "AIR003"]
ignore = ["DAG006", "BP005"]
# Set minimum severity level
severity = "error" # or "warning", "info"
# Paths to check (automatically validated)
paths = ["dags/", "custom_dags/"]
# Per-file ignores
[tool.dagruff.per-file-ignores]
"legacy_dags/*.py" = ["DAG006", "DAG007"]
Configuration Validation: DagRuff validates configuration values:
- Ensures
pathsandignoreare lists of strings - Validates rule ID format (e.g., DAG001, AIR002)
- Normalizes whitespace and filters empty values
- Gracefully handles invalid values with warnings
Caching: Results are cached by default based on file hash. Use --no-cache to disable:
- Automatic cache invalidation on file changes
- Memory-efficient singleton cache
- Deep copy returns for safety
Examples
The examples/ directory contains:
example_dag_good.py- Example of a correct DAGexample_dag_bad.py- Example DAG with errors to demonstrate the linter
Plugins
DagRuff supports custom rule plugins via Python entry points. See PLUGINS.md for detailed documentation.
Quick Example:
# my_plugin/__init__.py
from typing import List
from dagruff.rules.ast_collector import ASTCollector
from dagruff.models import LintIssue, Severity
def check_all_custom_rules(collector: ASTCollector, file_path: str) -> List[LintIssue]:
"""Custom rule checker following RuleChecker protocol."""
issues = []
# Your custom logic here
return issues
# pyproject.toml
[project.entry-points."dagruff.rules"]
my_custom_rule = "my_plugin:check_all_custom_rules"
Contributing
Contributions are welcome and highly appreciated! To get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Ensure tests pass (
pytest tests/) - 296+ tests with 77% code coverage - Ensure code is formatted (
ruff format) and linted (ruff check) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
Pre-commit Hooks: Tests run automatically before each commit. See PRE_COMMIT.md for setup.
Development
Setup
# Clone the repository
git clone https://github.com/dkfancska/dagruff.git
cd dagruff
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode (with Airflow for full functionality)
pip install -e ".[airflow,dev]"
# or using uv
uv pip install -e ".[airflow,dev]"
# Run tests (296+ tests)
pytest tests/
# Run tests with coverage (current coverage: 77%)
pytest --cov=dagruff tests/
# Format code
ruff format dagruff tests/
# Lint code
ruff check dagruff tests/
# Run specific test file
pytest tests/test_linter.py -v
Project Structure
dagruff/
โโโ dagruff/ # Main package
โ โโโ __init__.py
โ โโโ cli/ # CLI package (refactored)
โ โ โโโ __init__.py # Main entry point
โ โ โโโ runner.py # CLI orchestrator
โ โ โโโ linter.py # Linting functions
โ โ โโโ commands/ # Command pattern
โ โ โ โโโ base.py # BaseCommand
โ โ โ โโโ check.py # CheckCommand
โ โ โ โโโ fix.py # FixCommand
โ โ โโโ formatters/ # Output formatters
โ โ โ โโโ human.py # Human-readable format
โ โ โ โโโ json.py # JSON format
โ โ โโโ utils/ # CLI utilities
โ โ โโโ args.py # Argument parsing
โ โ โโโ files.py # File utilities
โ โ โโโ config_handler.py
โ โ โโโ autofix_handler.py
โ โโโ config.py # Configuration handling with validation
โ โโโ linter.py # Main linter with caching
โ โโโ cache.py # Caching implementation
โ โโโ models.py # Data models
โ โโโ autofix.py # Auto-fix implementation
โ โโโ plugins.py # Plugin system
โ โโโ validation.py # Input validation
โ โโโ logger.py # Logging setup
โ โโโ rules/ # Lint rules
โ โโโ base.py # Protocols (RuleChecker, Linter, Autofixer)
โ โโโ ast_collector.py # AST data collector
โ โโโ dag_rules.py # DAG-specific rules
โ โโโ ruff_air_rules.py
โ โโโ best_practices_rules.py
โ โโโ airflint_rules.py
โ โโโ utils.py # Rule utilities
โโโ tests/ # Tests (296+ tests)
โโโ examples/ # Example DAG files
โโโ pyproject.toml # Project configuration
โโโ README.md # This file
โโโ RULES.md # Rule descriptions
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
DagRuff draws inspiration from:
- Ruff - For project structure and design philosophy
- flake8-airflow - For Airflow-specific rules
- airflint - For AST-based linting approaches
- Astronomer Guides - For best practices
Special thanks to the Apache Airflow community for their excellent documentation and tooling.
Support
Having trouble? Check out the existing Issues or feel free to open a new one.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dagruff-1.0.0.tar.gz.
File metadata
- Download URL: dagruff-1.0.0.tar.gz
- Upload date:
- Size: 67.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7743c75b0d16b541e387e955bc683864507c70032e80a4d1d246bdea3920831
|
|
| MD5 |
4d7a22f184416fdc9876a04da606fa88
|
|
| BLAKE2b-256 |
c3de072c0f88ca2b1583acac57a84beb57c54351c3ed39b3f0605c9229ff2895
|
Provenance
The following attestation bundles were made for dagruff-1.0.0.tar.gz:
Publisher:
publish.yml on dkfancska/DagRuff
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dagruff-1.0.0.tar.gz -
Subject digest:
e7743c75b0d16b541e387e955bc683864507c70032e80a4d1d246bdea3920831 - Sigstore transparency entry: 673757509
- Sigstore integration time:
-
Permalink:
dkfancska/DagRuff@d26d65cdcdd8ce36dea5d892b392cda3487a167d -
Branch / Tag:
refs/heads/master - Owner: https://github.com/dkfancska
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d26d65cdcdd8ce36dea5d892b392cda3487a167d -
Trigger Event:
push
-
Statement type:
File details
Details for the file dagruff-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dagruff-1.0.0-py3-none-any.whl
- Upload date:
- Size: 49.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed94bdba2d0f17a4567d545a74c365a7ea3edb88a4e2b17e862c2a26856ed88c
|
|
| MD5 |
3e7e2398cb2c4e42a7dda772b85b350e
|
|
| BLAKE2b-256 |
596134a42f7dae425da43da9b08f67e8156cb2a7d232a76bbd7ed5fe67cfb84d
|
Provenance
The following attestation bundles were made for dagruff-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on dkfancska/DagRuff
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dagruff-1.0.0-py3-none-any.whl -
Subject digest:
ed94bdba2d0f17a4567d545a74c365a7ea3edb88a4e2b17e862c2a26856ed88c - Sigstore transparency entry: 673757516
- Sigstore integration time:
-
Permalink:
dkfancska/DagRuff@d26d65cdcdd8ce36dea5d892b392cda3487a167d -
Branch / Tag:
refs/heads/master - Owner: https://github.com/dkfancska
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d26d65cdcdd8ce36dea5d892b392cda3487a167d -
Trigger Event:
push
-
Statement type: