Skip to main content

CLI tool for extracting Docker image filesystems, inspecting large files, and rebuilding optimized Docker images.

Project description

docker-assemble

PyPI License

docker-assemble is a Python CLI tool for extracting Docker image filesystems, inspecting image contents, finding large files, and rebuilding optimized Docker images.

It helps developers, researchers, and DevOps engineers understand what is inside a Docker image by exporting the image filesystem into a local directory. You can use it to analyze container images, inspect files, identify oversized files, and optionally create a new Docker image after removing selected files.

Features

  • Extract the filesystem of a Docker image into a local directory
  • Pull an image automatically if it is not available locally
  • Inspect Docker image contents for research, debugging, and optimization
  • Detect files larger than a configurable size limit
  • Optionally remove selected large files
  • Rebuild a new Docker image from the filtered filesystem
  • Simple command-line interface built with Python

Why use docker-assemble?

Docker images can contain unnecessary files, large artifacts, cached dependencies, logs, build leftovers, or other filesystem content that increases image size. docker-assemble makes it easier to inspect the full filesystem of an image and understand what contributes to its size.

This can be useful for:

  • Docker image analysis
  • Container image optimization
  • DevOps research
  • Security and filesystem inspection
  • Finding large files inside Docker images
  • Rebuilding smaller Docker images
  • Understanding image contents without manually creating containers

Installation

Install from PyPI:

pip install docker-assemble

Requirements

  • Python 3.8+
  • Docker installed and running
  • Access to the Docker daemon

Basic usage

Extract a Docker image filesystem into a local directory:

docker-assemble -d ubuntu:20.04 output_dir

This extracts the filesystem of ubuntu:20.04 into output_dir.

Analyze large files

You can scan the extracted filesystem for files larger than a given size:

docker-assemble -d ubuntu:20.04 output_dir --maximum-file-size 100M

Supported size suffixes include:

  • K for kilobytes
  • M for megabytes
  • G for gigabytes

Examples:

docker-assemble -d ubuntu:20.04 output_dir --maximum-file-size 10M
docker-assemble -d python:3.11 output_dir --maximum-file-size 500M
docker-assemble -d node:20 output_dir --maximum-file-size 1G

Rebuild a Docker image

Pass --new-image-name to rebuild the extracted filesystem as a single-layer image (FROM scratch + COPY . /). --maximum-file-size is optional:

  • Without --maximum-file-size — no files are filtered out. The new image contains the same content as the original, just consolidated into one layer. Useful for comparing a multi-layer original against a squashed single-layer version without conflating filtering effects.

    docker-assemble -d ubuntu:20.04 output_dir \
      --new-image-name ubuntu-squashed
    
  • With --maximum-file-sizedocker-assemble lists files above the threshold, asks which should be removed, and rebuilds the image without them:

    docker-assemble -d ubuntu:20.04 output_dir \
      --maximum-file-size 100M \
      --new-image-name ubuntu-optimized
    

Package

docker-assemble is available on PyPI:

pip install docker-assemble

PyPI: https://pypi.org/project/docker-assemble/

Debug mode

Enable debug logging with:

docker-assemble --debug -d ubuntu:20.04 output_dir

Example workflow

# Extract a Docker image filesystem
docker-assemble -d python:3.11 python-image-filesystem

# Find files larger than 100 MB
docker-assemble -d python:3.11 python-image-filesystem --maximum-file-size 100M

# Rebuild a new image after removing selected large files
docker-assemble -d python:3.11 python-image-filesystem \
  --maximum-file-size 100M \
  --new-image-name python-optimized

Use cases

docker-assemble is useful when you need to:

  • inspect the contents of a Docker image
  • analyze why a Docker image is large
  • identify unnecessary files in a container image
  • export an image filesystem for research
  • compare Docker image contents
  • create a smaller image after removing selected files
  • debug container filesystem structure

How it works

docker-assemble uses the Docker SDK for Python to access Docker images. If the requested image is not available locally, it pulls the image. It then creates a temporary container, exports the container filesystem, extracts it into the selected output directory, and optionally rebuilds a new image from a filtered filesystem.

Project status

This project is in active development. Contributions, issues, and suggestions are welcome.

License

This project is licensed under the Apache License 2.0.

Docker is a trademark of Docker, Inc. This project is not affiliated with or endorsed by Docker, Inc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docker_assemble-0.5.3.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docker_assemble-0.5.3-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file docker_assemble-0.5.3.tar.gz.

File metadata

  • Download URL: docker_assemble-0.5.3.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for docker_assemble-0.5.3.tar.gz
Algorithm Hash digest
SHA256 a8d4ac9ab3253a2f49d8116654c5b0e911c6c8f62a0f277a7ee4eb49fb6856d6
MD5 e0db703eb18fbd36fa3667d134dad7f1
BLAKE2b-256 91c2e2beb123e6d3172517f6b5106bf7a7568fb8b23821bec2be423f9bd1d486

See more details on using hashes here.

File details

Details for the file docker_assemble-0.5.3-py3-none-any.whl.

File metadata

File hashes

Hashes for docker_assemble-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6152577fe4f417fd558c000e1890e7c1d33de1c325d8b4f36a0ee395cf5a312a
MD5 b97fd6ccbb1e9f4cf924fc03d9113113
BLAKE2b-256 436ad9e9a480a4bac2152cd6500472da6536c7c83add9ec9bc9c0dcf9f85b795

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page