Skip to main content

Detect configuration drift in AWS Glue jobs against a source-of-truth YAML

Project description

glue-drift

Detect configuration drift in AWS Glue jobs.

glue-drift compares your live AWS Glue job configurations against a source-of-truth YAML file and reports exactly what has drifted — field by field.

Built for data engineering teams managing multi-environment Glue deployments (DEV / QA / UAT / PROD).


Why glue-drift?

AWS Glue jobs can drift from their intended configuration due to:

  • Manual edits in the AWS Console
  • Failed or partial deployments
  • Auto-injected AWS keys polluting comparisons
  • Key-order differences creating false positives

glue-drift handles all of these correctly.


Installation

pip install glue-drift

Quickstart

1. Create your source-of-truth jobs.yaml:

jobs:
  my-glue-job:
    Name: my-glue-job
    Role: arn:aws:iam::123456789012:role/my-glue-role
    GlueVersion: "4.0"
    WorkerType: G.1X
    NumberOfWorkers: 2
    Timeout: 120
    MaxRetries: 0
    Command:
      Name: glueetl
      ScriptLocation: s3://my-bucket/scripts/my_script.py
      PythonVersion: "3"
    DefaultArguments:
      --enable-metrics: "true"
      --TempDir: s3://my-temp-bucket/

2. Run the drift check:

glue-drift check --config jobs.yaml

3. Example output:

============================================================
  GLUE DRIFT REPORT
============================================================
  Jobs checked : 3
  OK       : 1
  Drifted  : 1
  Missing  : 1
============================================================

  ✔  my-glue-job-ok

  ✘  my-glue-job-drifted  [DRIFTED]
      Field: WorkerType
        Expected: G.1X
        Actual:   G.2X

  ✘  my-glue-job-missing  [MISSING in AWS]
      Job 'my-glue-job-missing' not found in AWS Glue.

============================================================
  ❌ Drift detected! Review the above jobs.
============================================================

CLI Options

glue-drift check --config jobs.yaml [OPTIONS]

Options:
  -c, --config PATH       Path to source-of-truth YAML config  [required]
  -r, --region TEXT       AWS region  [default: us-east-2]
  -p, --profile TEXT      AWS CLI profile name (optional)
  -o, --output PATH       Write JSON report to file (e.g. report.json)
  --fail-on-drift         Exit code 1 if drift found (for CI/CD pipelines)
  --version               Show version and exit
  --help                  Show this message and exit

CI/CD Integration

Use --fail-on-drift to block deployments when drift is detected:

# In your buildspec.yaml or GitHub Actions workflow:
- name: Check Glue job drift
  run: glue-drift check --config jobs.yaml --output drift-report.json --fail-on-drift

Python API

Use glue-drift programmatically:

from glue_drift import check_all_jobs, print_terminal_report

results = check_all_jobs(config_path="jobs.yaml", region="us-east-2")
print_terminal_report(results)

for result in results:
    if result.has_drift:
        print(f"Job {result.job_name} has drifted!")
        for drift in result.drifts:
            print(f"  {drift.field}: expected={drift.expected}, actual={drift.actual}")

What glue-drift normalizes automatically

  • AWS auto-injected keys stripped: --job-language, --class
  • AWS managed metadata ignored: CreatedOn, LastModifiedOn, LastModifiedBy
  • JSON key ordering normalized — no false positives from key-order differences

Authentication

glue-drift uses standard boto3 credential resolution:

  1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  2. IAM role (recommended for EC2 / Lambda / CI runners)
  3. AWS CLI profile via --profile

Development

git clone https://github.com/Pushpalatha58/glue-drift
cd glue-drift
pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glue_drift-0.1.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glue_drift-0.1.0-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file glue_drift-0.1.0.tar.gz.

File metadata

  • Download URL: glue_drift-0.1.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for glue_drift-0.1.0.tar.gz
Algorithm Hash digest
SHA256 22c3f0af14c3cf9e9a4ecd7786ae841f64bc07ebba7fd258df264cef2bd8a472
MD5 9e8087d0ffe9d24a7ac9f41fb2408518
BLAKE2b-256 3ef79bb0be65789203ab29637cfe2b06c004794f214f402995ced53341240eb4

See more details on using hashes here.

File details

Details for the file glue_drift-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: glue_drift-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for glue_drift-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ab805717c6f44118f09775d22e5289c6c0f2eb81eb82c377e5ba903b103eb7f9
MD5 e1a7419f5d8b5cdaec56610f31dd09e8
BLAKE2b-256 f0391aadff61e04df22dabe4757c55b69d9c15d5fee7492d82fb744523d008dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page