Skip to main content

A smart Dockerfile linter, optimizer, and explainer.

Project description

🐳 Docktor

A Dockerfile Linter and Optimizer Built on AST Parsing
Static analysis and transformation of Dockerfile instructions using structured syntax trees.

PyPI version License Build Status Python Versions

Overview

Docktor is a static analysis tool for Dockerfiles that uses Abstract Syntax Tree (AST) parsing to identify issues and apply optimizations. Unlike regex-based approaches, Docktor constructs a structured representation of Dockerfile instructions, enabling reliable pattern matching and safe transformations.

The project demonstrates end-to-end architecture design: recursive descent parsing → plugin-based rule engine → automated optimization → Docker SDK benchmarking.

✨ Key Technical Features

  • AST-Based Parsing – Recursive descent parser with multi-line continuation handling (\), not regex-based pattern matching
  • Extensible Rule Engine – Plugin architecture using Python decorators for linting rules (best practices, performance, security, registry checks)
  • Safe Optimization Pipeline – 8-stage transformation pipeline with isolated optimization passes and change tracking
  • Benchmarking Harness – Direct Docker SDK integration to measure real build metrics (image size, layer count, build duration) in isolated temp environments
  • Structured Output – Both human-readable (Rich) and machine-readable (JSON) formats for CI/CD integration

📦 What's New in v0.2.0

  • Registry Rule (REG001) – Docker Hub API integration to detect newer patch versions of base images
  • GitHub Actions Composite Action – Pre-built workflow for CI/CD automation
  • Improved CLI – Encoding auto-detection (chardet) for non-UTF-8 Dockerfiles, better error handling

🚀 Quick Start

Requirements

  • Python 3.8+
  • Docker (for benchmarking feature; linting works without it)

Installation

pip install docktor-py

Usage

1. Lint a Dockerfile

Run static analysis against 21 rules:

docktor lint Dockerfile

2. View Detailed Explanations

Each rule includes a structured explanation of the issue and suggested fix:

docktor lint Dockerfile --explain

3. Generate Optimized Dockerfile

Apply automated transformations (RUN merging, layer reduction, cache cleanup):

# View transformations with change summary
docktor optimize Dockerfile

# Output clean Dockerfile without pretty printing (for piping)
docktor optimize Dockerfile --raw > Dockerfile.optimized

4. Benchmark Optimization Impact

Build both images in isolated temp environments and compare metrics:

# Must run from directory containing all COPY/ADD source files
docktor benchmark Dockerfile Dockerfile.optimized

5. Export Results as JSON

For CI/CD integration:

docktor lint Dockerfile --format json

⚙️ Linting Rules Reference

Docktor enforces 20 rules across four categories. Each rule is implemented as a plugin with explicit checks against the AST:

Best Practice Rules (BP)

Rule ID Description Auto-Optimized?
BP001 FROM uses :latest or no tag Yes
BP002 EXPOSE present without HEALTHCHECK No
BP003 EXPOSE missing /tcp or /udp protocol Yes
BP004 LABEL instruction missing for metadata No
BP005 RUN command used in scratch image No (error)
BP006 COPY --from refers to non-existent stage No (error)
BP007 CMD/ENTRYPOINT uses shell form No
BP008 WORKDIR path is not absolute No
BP009 apt-get install missing apt-get update No (error)

Performance Rules (PERF)

Rule ID Description Auto-Optimized?
PERF001 Consecutive RUN commands can be merged Yes
PERF002 apt-get install missing cache cleanup Yes
PERF003 Broad COPY before dependency install No
PERF004 Build-time packages installed in final image No
PERF005 Unsafe apt-get upgrade command used No
PERF006 Broad COPY . . pattern used No
PERF007 Redundant apt-get update command No

Security Rules (SEC)

Rule ID Description Auto-Optimized?
SEC001 ADD used instead of COPY Yes
SEC002 Container runs as root user No
SEC003 Potential secrets in ENV variables No
SEC004 COPY missing --chown for non-root user No

Registry Rules (REG) - New in v0.2.0

Rule ID Description Auto-Optimized?
REG001 Newer patch version available on Docker Hub No

How It Works

1. Parsing Phase

The DockerfileParser uses recursive descent with regex anchors to tokenize and structure Dockerfile content:

  • Strips and normalizes lines
  • Handles line continuations (backslash escape)
  • Constructs DockerInstruction objects with metadata (line number, type, value, image/tag/alias)
  • Tolerates malformed input gracefully

2. Analysis Phase

The Analyzer loads all rule implementations as plugins (via Rule.__subclasses__()) and runs them:

  • Each rule performs AST traversal over instructions
  • Rules check for specific patterns (e.g., instruction.instruction_type == InstructionType.RUN)
  • Issues are collected with severity, explanation, and fix suggestions

3. Optimization Phase

The DockerfileOptimizer applies 8 sequential transformations:

  1. RUN Merging – Combines consecutive RUN commands with &&
  2. FROM Pinning – Tags untagged base images with :latest
  3. apt-get Cache Cleanup – Appends rm -rf /var/lib/apt/lists/*
  4. EXPOSE Protocol – Adds /tcp suffix to port numbers
  5. ADD → COPY – Security-motivated instruction replacement
  6. Metadata Combining – Merges consecutive LABEL/ENV/ARG instructions
  7. sudo Removal – Strips unnecessary sudo from RUN commands
  8. apt-get Update – Prepends apt-get update where required

Each pass is isolated and order-dependent. Changes are tracked and reported.

4. Benchmarking Phase

The DockerBenchmarker uses Docker SDK to build images in ephemeral containers:

  • Creates temp directory with Dockerfile
  • Calls docker.client.api.build() to measure build duration
  • Captures final image size and layer count from image metadata
  • Cleans up images after measurement
  • Computes % improvement across metrics

Scope Note: Benchmarking metrics are measured in local test environments (validated on 8GB RAM, Docker daemon on host). No multi-machine or cluster testing.

CI/CD Integration

GitHub Actions

Automate linting in workflows:

name: Dockerfile Quality Check
on: [push, pull_request]

jobs:
  docktor:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Docktor Linter
        uses: nash0810/docktor@v0.2.0
        with:
          dockerfile: "./Dockerfile"
          explain: "true"

Inputs:

  • dockerfile (optional, default: Dockerfile)
  • explain (optional, default: false)
  • format (optional, default: text, accepts json)

Other Platforms

Any CI/CD system supporting Python and Docker:

pip install docktor-py
docktor lint Dockerfile --format json

Benchmarking Methodology

The docktor benchmark command measures real Docker builds using the Docker SDK:

  • Builds each Dockerfile in an isolated temporary directory
  • Extracts image size (bytes), layer count, and build duration from Docker metadata
  • Computes percentage improvements ((original - optimized) / original * 100)
  • Cleans up images after measurement

Requirements:

  • Docker daemon must be running
  • Must run from directory containing all source files referenced in COPY/ADD instructions
  • Builds are not cached between runs (fresh builds each time)

Tested Scenario: Reduced image size by ~40% in sample Python/Node.js multi-stage build scenarios with aggressive layer merging.


Development

Clone and install in editable mode:

git clone https://github.com/Nash0810/docktor.git
cd docktor
pip install -e ".[dev]"
pytest

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docktor_py-0.2.1.tar.gz (31.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docktor_py-0.2.1-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file docktor_py-0.2.1.tar.gz.

File metadata

  • Download URL: docktor_py-0.2.1.tar.gz
  • Upload date:
  • Size: 31.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for docktor_py-0.2.1.tar.gz
Algorithm Hash digest
SHA256 2531c7c36184930fa68bd578bc9e307a8a583b3c682668a32265623ed9c4d745
MD5 a0b84691f59794bdca153d8d1341c58c
BLAKE2b-256 3dd78fde9f5a465a1051f7dfc9c643c178b5c1f2b7ea957588976dc8fa42bbab

See more details on using hashes here.

File details

Details for the file docktor_py-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: docktor_py-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for docktor_py-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 72402833c0dd578619f1d240f0725bfab67e67432b77c70f58bfdad8ce44a3fb
MD5 be2e113d919812da94d3298a65ae2bc7
BLAKE2b-256 97dea37f782fc221aa3067045b9efe1366de223af82a515cc79e0473313e9062

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page