Skip to main content

An extremely fast Python linter for Apache Airflow DAG files

Project description

DagRuff

An extremely fast Python linter for Apache Airflow DAG files, written in Python.

PyPI version Python Version License Coverage

DagRuff is a linter designed to catch common errors and enforce best practices in Apache Airflow DAG files. It checks for over 31 rules covering DAG structure, best practices, and Airflow-specific patterns.

Features

  • Fast: Built with performance in mind, using AST parsing for static analysis
  • Caching: Results are cached based on file hash for improved performance
  • Comprehensive: 31+ lint rules covering DAG structure, best practices, and Airflow patterns
  • Auto-fix: Automatically fix many common issues with --fix
  • Configurable: Configure rules via pyproject.toml or .dagruff.toml with validation
  • Plugin Support: Extend functionality with custom rule plugins via entry points
  • No Airflow Required: Works without Airflow for AST-based checks (optional DagBag validation requires Airflow)

Installation

# Basic installation (no Airflow, AST checks only)
pip install dagruff

# With Airflow support (recommended for full DagBag validation)
pip install dagruff[airflow]

Or install from source:

git clone https://github.com/dkfancska/dagruff.git
cd dagruff
pip install -e ".[airflow]"

Note: Basic installation works without Airflow and performs all static checks via AST. For DagBag validation (import checking and code execution), install with the airflow extra.

Usage

After installation, use the dagruff command:

# Check a single file
dagruff examples/example_dag_good.py

# Check a directory
dagruff examples/

# Filter by severity
dagruff examples/ --severity warning

# JSON output
dagruff examples/ --format json

# Use configuration file
dagruff --config .dagruff.toml

# Without path - uses paths from config
dagruff

# Auto-fix all fixable issues
dagruff examples/ --fix

# Auto-fix specific rules
dagruff examples/ --fix DAG001 DAG009 AIR003

# Ignore specific rules
dagruff examples/ --ignore DAG006 DAG007

# Disable caching (useful for CI/CD)
dagruff examples/ --no-cache

# Verbose logging
dagruff examples/ --log-level debug

Lint Rules

DagRuff implements 31 lint rules from various sources:

DAG Rules (13 rules)

  • DAG import and definition checks
  • dag_id validation and uniqueness
  • Required DAG parameters (dag_id, start_date)
  • Recommended parameters (dag_md)
  • Special checks for KubernetesPodOperator (requires container_resources and executor_resources)

Ruff AIR Rules (4 rules)

  • AIR002: Check for start_date presence
  • AIR003: Check catchup parameter
  • AIR013: Recommend max_active_runs
  • AIR014: Recommend max_active_tasks for Airflow 2+ (warn about deprecated concurrency)

flake8-airflow Rules (4 rules)

  • AF001: Forbid SubDagOperator usage
  • AF002: Security warnings for BashOperator
  • AF003: Check task_id uniqueness
  • AF004: Detect deprecated operators

airflint AST Rules (4 rules)

  • AIRFLINT001: Check task dependencies
  • AIRFLINT002: Check XCom usage
  • AIRFLINT003: Check Variables usage
  • AIRFLINT004: Check required operator parameters

Best Practices Rules (6 rules)

  • BP001: Check for top-level code avoidance
  • BP002: Check datetime function usage
  • BP003: Recommend execution_timeout for tasks
  • BP004: Check dependency method consistency
  • BP005: Recommend docstrings for tasks
  • BP006: Recommend dagrun_timeout for DAGs

Full documentation:

  • ๐Ÿ“– RULES.md - Complete rule descriptions with examples, quick reference, and grouping
  • ๐Ÿ”Œ PLUGINS.md - Plugin system documentation
  • ๐Ÿ”ง CONTRIBUTING.md - Contribution guidelines
  • โœ… PRE_COMMIT.md - Pre-commit hooks setup

Auto-fix (--fix)

DagRuff supports automatic fixing of many issues via the --fix flag:

Fixable Rules:

  • DAG001 - Adds from airflow import DAG import
  • DAG005 - Removes extra spaces in dag_id
  • DAG009 - Adds "owner": "airflow" to default_args
  • DAG010 - Adds "retries": 1 to default_args
  • AIR003 - Adds catchup=False to DAG
  • AIR013 - Adds max_active_runs=1 to DAG
  • AIR014 - Replaces concurrency with max_active_tasks or adds max_active_tasks=1

Usage:

# Fix all fixable issues
dagruff examples/ --fix

# Fix only specific rules
dagruff examples/ --fix DAG001 DAG009

# Combine with other options
dagruff examples/ --fix DAG001 --severity warning

Note: Auto-fix preserves code formatting and checks for duplicates before adding parameters. Uses AST-based approach for more reliable fixes with fallback to regex when needed.

Configuration

DagRuff can be configured via pyproject.toml or .dagruff.toml:

[tool.dagruff]
# Enable/disable specific rules
select = ["DAG001", "DAG002", "AIR003"]
ignore = ["DAG006", "BP005"]

# Set minimum severity level
severity = "error"  # or "warning", "info"

# Paths to check (automatically validated)
paths = ["dags/", "custom_dags/"]

# Per-file ignores
[tool.dagruff.per-file-ignores]
"legacy_dags/*.py" = ["DAG006", "DAG007"]

Configuration Validation: DagRuff validates configuration values:

  • Ensures paths and ignore are lists of strings
  • Validates rule ID format (e.g., DAG001, AIR002)
  • Normalizes whitespace and filters empty values
  • Gracefully handles invalid values with warnings

Caching: Results are cached by default based on file hash. Use --no-cache to disable:

  • Automatic cache invalidation on file changes
  • Memory-efficient singleton cache
  • Deep copy returns for safety

Examples

The examples/ directory contains:

  • example_dag_good.py - Example of a correct DAG
  • example_dag_bad.py - Example DAG with errors to demonstrate the linter

Plugins

DagRuff supports custom rule plugins via Python entry points. See PLUGINS.md for detailed documentation.

Quick Example:

# my_plugin/__init__.py
from typing import List
from dagruff.rules.ast_collector import ASTCollector
from dagruff.models import LintIssue, Severity

def check_all_custom_rules(collector: ASTCollector, file_path: str) -> List[LintIssue]:
    """Custom rule checker following RuleChecker protocol."""
    issues = []
    # Your custom logic here
    return issues
# pyproject.toml
[project.entry-points."dagruff.rules"]
my_custom_rule = "my_plugin:check_all_custom_rules"

Contributing

Contributions are welcome and highly appreciated! To get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Ensure tests pass (pytest tests/) - 296+ tests with 77% code coverage
  5. Ensure code is formatted (ruff format) and linted (ruff check)
  6. Commit your changes (git commit -m 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Pre-commit Hooks: Tests run automatically before each commit. See PRE_COMMIT.md for setup.

Development

Setup

# Clone the repository
git clone https://github.com/dkfancska/dagruff.git
cd dagruff

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode (with Airflow for full functionality)
pip install -e ".[airflow,dev]"
# or using uv
uv pip install -e ".[airflow,dev]"

# Run tests (296+ tests)
pytest tests/

# Run tests with coverage (current coverage: 77%)
pytest --cov=dagruff tests/

# Format code
ruff format dagruff tests/

# Lint code
ruff check dagruff tests/

# Run specific test file
pytest tests/test_linter.py -v

Project Structure

dagruff/
โ”œโ”€โ”€ dagruff/                # Main package
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ cli/                # CLI package (refactored)
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py     # Main entry point
โ”‚   โ”‚   โ”œโ”€โ”€ runner.py       # CLI orchestrator
โ”‚   โ”‚   โ”œโ”€โ”€ linter.py       # Linting functions
โ”‚   โ”‚   โ”œโ”€โ”€ commands/       # Command pattern
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ base.py     # BaseCommand
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ check.py    # CheckCommand
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ fix.py      # FixCommand
โ”‚   โ”‚   โ”œโ”€โ”€ formatters/    # Output formatters
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ human.py   # Human-readable format
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ json.py    # JSON format
โ”‚   โ”‚   โ””โ”€โ”€ utils/          # CLI utilities
โ”‚   โ”‚       โ”œโ”€โ”€ args.py     # Argument parsing
โ”‚   โ”‚       โ”œโ”€โ”€ files.py    # File utilities
โ”‚   โ”‚       โ”œโ”€โ”€ config_handler.py
โ”‚   โ”‚       โ””โ”€โ”€ autofix_handler.py
โ”‚   โ”œโ”€โ”€ config.py           # Configuration handling with validation
โ”‚   โ”œโ”€โ”€ linter.py           # Main linter with caching
โ”‚   โ”œโ”€โ”€ cache.py            # Caching implementation
โ”‚   โ”œโ”€โ”€ models.py           # Data models
โ”‚   โ”œโ”€โ”€ autofix.py          # Auto-fix implementation
โ”‚   โ”œโ”€โ”€ plugins.py          # Plugin system
โ”‚   โ”œโ”€โ”€ validation.py       # Input validation
โ”‚   โ”œโ”€โ”€ logger.py           # Logging setup
โ”‚   โ””โ”€โ”€ rules/              # Lint rules
โ”‚       โ”œโ”€โ”€ base.py         # Protocols (RuleChecker, Linter, Autofixer)
โ”‚       โ”œโ”€โ”€ ast_collector.py # AST data collector
โ”‚       โ”œโ”€โ”€ dag_rules.py    # DAG-specific rules
โ”‚       โ”œโ”€โ”€ ruff_air_rules.py
โ”‚       โ”œโ”€โ”€ best_practices_rules.py
โ”‚       โ”œโ”€โ”€ airflint_rules.py
โ”‚       โ””โ”€โ”€ utils.py        # Rule utilities
โ”œโ”€โ”€ tests/                  # Tests (296+ tests)
โ”œโ”€โ”€ examples/               # Example DAG files
โ”œโ”€โ”€ pyproject.toml          # Project configuration
โ”œโ”€โ”€ README.md               # This file
โ””โ”€โ”€ RULES.md                # Rule descriptions

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

DagRuff draws inspiration from:

Special thanks to the Apache Airflow community for their excellent documentation and tooling.

Support

Having trouble? Check out the existing Issues or feel free to open a new one.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dagruff-1.0.0.tar.gz (67.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dagruff-1.0.0-py3-none-any.whl (49.8 kB view details)

Uploaded Python 3

File details

Details for the file dagruff-1.0.0.tar.gz.

File metadata

  • Download URL: dagruff-1.0.0.tar.gz
  • Upload date:
  • Size: 67.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dagruff-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e7743c75b0d16b541e387e955bc683864507c70032e80a4d1d246bdea3920831
MD5 4d7a22f184416fdc9876a04da606fa88
BLAKE2b-256 c3de072c0f88ca2b1583acac57a84beb57c54351c3ed39b3f0605c9229ff2895

See more details on using hashes here.

Provenance

The following attestation bundles were made for dagruff-1.0.0.tar.gz:

Publisher: publish.yml on dkfancska/DagRuff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dagruff-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dagruff-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 49.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dagruff-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ed94bdba2d0f17a4567d545a74c365a7ea3edb88a4e2b17e862c2a26856ed88c
MD5 3e7e2398cb2c4e42a7dda772b85b350e
BLAKE2b-256 596134a42f7dae425da43da9b08f67e8156cb2a7d232a76bbd7ed5fe67cfb84d

See more details on using hashes here.

Provenance

The following attestation bundles were made for dagruff-1.0.0-py3-none-any.whl:

Publisher: publish.yml on dkfancska/DagRuff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page