Skip to main content

dokter: The doctor for your Dockerfiles

Project description

Dokter: the doctor for your Dockerfiles

The objective of dokter is to make your Dockerfiles better, it will make sure that your Dockerfiles:

  • build secure images
  • build smaller images
  • build faster
  • follow best practices
  • are pretty formatted

Rules information

For an overview of the rules see: rules

DevOps lifecycle

Typically, a CI/CD pipeline consists of roughly the following steps:

  • lint code
  • build Docker image
  • run tests in Docker image
  • scan image for vulnerabilities (hopefully)
  • push image to registry
  • deploy image

Dokter fits into the first stage and aims to prevent building an image that exposes credentials or contains vulnerabilities, which at the bare minimum saves CI/CD minutes.

Separate processes like container registry scanning will also run, but they may run only after an image has been pushed, potentially already exposing a vulnerable image to the public.

Video explaining Dokter

<iframe width="560" height="315" src="https://www.youtube.com/embed/8aKScUQjMWY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

What makes Dokter special?

Good question, Dokter is the byproduct of a much bigger effort, GitLab AI Assist, as a first starting point, Dockerfiles were chosen. A parser was developed to fully parse Dockerfiles in a format that is designed for machine learning. In order to train ML models, there is a need to create a large, rich dataset and in order to do that a good analysis of Dockerfiles is needed. Hence, the creation of Dokter. It will start improving your Dockerfiles from day 1 but will become much more powerful in the future, eventually it will automatically create Dockerfiles for you.

No telemetry

No worries, your Dockerfiles remain private, Dokter won't share any telemetry with GitLab, perhaps at some point in time when machine learning models would benefit from user feedback, the option to provide anonymous feedback may be, with plenty of user awareness and opt-in, introduced.

Dynamic parser

The parser behind Dokter has been designed with data and ML in mind, it supports parsing of all Docker instructions and adds support for comments, both actual comments and commented out code.

The parser also supports dynamic analysis, it's context aware, example:

COPY . /app

If a static analysis was performed, it would approve the above instruction, Dokter however will actually list the files that are in . and analyze them against known files to contain credentials, but also filter against your .dockerignore file.

Usage

There are a couple of ways you can use Dokter:

  • Local
  • CI/CD

It is suggested to always use both, but at least run it where you are actually building and publishing your images.

Local usage

You will need to install Dokter from pip

pip install dokter 

# Or from GitLab
pip install --upgrade dokter --extra-index-url https://gitlab.com/api/v4/projects/36078023/packages/pypi/simple

dokter -d path/to/Dockerfile

If you want more information you can either run it in verbose mode or ask to explain a specific rule

# Explain rule dfa001
dokter -e dfa001

# Run in verbose mode (this will be a lot of text)
dokter -v -d path/to/Dockerfile

You can also use docker:

docker run -it -v $(pwd):/app registry.gitlab.com/gitlab-org/incubation-engineering/ai-assist/dokter/dokter:latest dokter -d docter.Dockerfile

Dockerfile formatting and auto-correction

Dokter is capable of creating a pretty formatted Dockerfile, as well as autocorrecting some errors found by the analyzer. It can either show -s or write -w the file, in case of writing it will overwrite the given Dockerfile, so it's easier to review and commit changes.

Shell commands will be analyzed using ShellCheck and where possible an error will be corrected automatically.

dokter -d Dockerfile -w

In case of showing, Dokter will first output the analysis report followed by the Dockerfile, at this moment it will output a file with some errors corrected but not all.

dokter -d Dockerfile -s

You can also combine -s and -w to both show and write the Dockerfile.

CI/CD

Usage in GitLab CI example:

dokter:
  image: registry.gitlab.com/gitlab-org/incubation-engineering/ai-assist/dokter/dokter:latest
  stage: lint
  script:
    - dokter -d Dockerfile

GitLab Static Application Security Testing (SAST)

To output the result of dokter to the GitLab security overview, simply run with the --sast flag. In a future release , support for remediation's will be added.

GitLab Code Quality

To use the output in GitLab code quality you can use below as an example:

dokter:
  image: registry.gitlab.com/gitlab-org/incubation-engineering/ai-assist/dokter/dokter:latest
  stage: lint
  script:
    - dokter -d dokter.Dockerfile --gitlab-codequality
  artifacts:
    name: "$CI_JOB_NAME artifacts from $CI_PROJECT_NAME on $CI_COMMIT_REF_SLUG"
    expire_in: 1 day
    when: always
    reports:
      codequality:
        - "dokter-$CI_COMMIT_SHA.json"
    paths:
      - "dokter-$CI_COMMIT_SHA.json"

Automatic merge requests with resolutions

Below is an example where Dokter is used to analyze a Dockerfile and autocorrect it, the output is then committed to a new branch with the following name structure dokter/<source_branch_name> and a merge request will be created and assigned to the user that started the pipeline.

dokter:
  image: registry.gitlab.com/gitlab-org/incubation-engineering/ai-assist/dokter/dokter:latest
  stage: lint
  variables:
    DOKTER_DOCKERFILE: Dockerfile
  before_script:
    - mkdir -p ~/.ssh && echo "$DOKTER_SSH_KEY" > ~/.ssh/id_rsa && chmod -R 700 ~/.ssh
  script:
    - dokter -d $DOKTER_DOCKERFILE --gitlab-codequality -w
  after_script:
    - bash /create-mr.sh
  artifacts:
    name: "$CI_JOB_NAME artifacts from $CI_PROJECT_NAME on $CI_COMMIT_REF_SLUG"
    expire_in: 1 day
    when: always
    reports:
      codequality:
        - "dokter-$CI_COMMIT_SHA.json"
    paths:
      - "dokter-$CI_COMMIT_SHA.json"
  rules:
    # Very important to prevent a loop :-)
    - if: $CI_COMMIT_REF_NAME !~ /^dokter/ && $CI_PIPELINE_SOURCE == "merge_request_event"

Gotcha's

Below are some subjects that could raise questions, errors.

Jinja templating

Jinja is ignored, what will happen is, the templated lines will get ignored and the Docker instructions will be parsed.

Example:

FROM scratch

{% if something %} # This line will be considered empty
RUN echo "some command" # This line will be parsed
{% endif %} # This line will be considered empty

Here strings (EOF)

At this moment if you have a here string in your bash command, the Dockerfile will fail, it can not be parsed. Support will be added in the future.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dokter-1.4.2.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

dokter-1.4.2-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file dokter-1.4.2.tar.gz.

File metadata

  • Download URL: dokter-1.4.2.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for dokter-1.4.2.tar.gz
Algorithm Hash digest
SHA256 02b646e0bdabb771b7a64f935f12557b4aa4fdf8112c8d34c91599a26039fb60
MD5 628e61929a2fe6e9ab84edd703b9781d
BLAKE2b-256 91934c19c186c3453cd9267b49b0b72c542d793b32a9014043f5de5f7e184031

See more details on using hashes here.

File details

Details for the file dokter-1.4.2-py3-none-any.whl.

File metadata

  • Download URL: dokter-1.4.2-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for dokter-1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2300a083da8b071b370e551f7f0f1fef70f473665c9153e748ebd249aea46bd0
MD5 bc947a4081eae78d2ee1460769cc8f56
BLAKE2b-256 c490ff51063acfa86796c0b2547d544845d30d1eeb180e8c67d6940fe9680d48

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page