Kedro-Docker makes it easy to package Kedro projects with Docker.
Project description
Kedro-Docker
Docker is a tool that makes it easier to create, deploy and run applications. It uses containers to package an application along with its dependencies and then runs the application in an isolated virtualised environment.
Configuring a Docker container environment may become complex and tedious. Kedro-Docker significantly simplifies this process and reduces it to 2 steps:
- Build a Docker image
- Run your Kedro project in a Docker environment
Note: Kedro-Docker also makes it easy for you to run IPython and Jupyter Notebooks in a Docker container.
How do I install Kedro-Docker?
Kedro-Docker is a Python plugin. To install it:
pip install kedro-docker
How do I use Kedro-Docker?
Prerequisites
The following conditions must be true for Kedro-Docker to package your project:
- Make sure you have installed Docker
- Kedro-Docker assumes that Docker daemon is up and running in your system
Generating a Dockerfile
In order to generate a Dockerfile
for your project, navigate to the project's root directory and then run the following from the command line:
kedro docker init
This command will generate Dockerfile
, .dockerignore
and .dive-ci
files for your project.
Options:
--with-spark
- optional flag to create aDockerimage
file with Spark and Hadoop support-h, --help
- show command help and exit.
Build a Docker image
In order to build a Docker image for your project, navigate to the project's root directory and then run the following from the command line:
kedro docker build
Behind the scenes Kedro does the following:
- Creates a template
Dockerfile
and.dockerignore
in the project root directory if those files don't already exist - Builds the project image using the
Dockerfile
from the project root directory
Note: When calling
kedro docker build
you can also pass any specific options fordocker build
by specifying--docker-args
option. For example,kedro docker build --docker-args="--no-cache"
instructs Docker not to use cache when building the image. You can learn more about available options here.
Note: By default,
kedro docker build
creates an image without Spark and Hadoop.
Note: By default, when calling
kedro docker build
image is built withpython:VERSION-buster
image, where VERSION is Python (major + minor) version from the current environment. By specifying--base-image
option, different base image can be used. For examplekedro docker build --base-image="python:3.8-buster"
.
Note: You can generate the
Dockerfile
,.dockerignore
or.dive-ci
files without building the image by runningkedro docker init
. This might be of use in case you would like to modify these files before the first build.
The project Docker image will automatically be tagged as <project-root-dir>:latest
, where <project-root-dir>
is the name of the project root directory. To change the tag, you can add the --image
command line option, for example: kedro docker build --image my-project-tag
.
When building the image Kedro copies the contents of the current project into the image, however it ignores the locations specified in .dockerignore
file in order to prevent the propagation of potentially sensitive data into the image. We recommend mounting those volumes at runtime.
Options:
--uid
- optional integer User ID for kedro user inside the container. Defaults to the current user's UID--gid
- optional integer Group ID for kedro user inside the container. Defaults to the current user's GID--image
- optional Docker image tag. Defaults to the project directory name--docker-args
- optional string containing extra options fordocker build
command--with-spark
- optional flag to create an image additionally with Spark and Hadoop--base-image
- optional base Docker image. Default is Debian buster with the current environment Python version, e.g.python:3.8-buster
-h, --help
- show command help and exit.
Run your project in a Docker environment
Once the project image has been built, you can run the project using a Docker environment:
kedro docker run
The command above will:
- Locate the image built in the previous section
- Copy the whole project directory into the
/home/kedro
container path - Execute
kedro run
command in a new container
Note: The
kedro docker run
command adds--rm
flag to the underlyingdocker run
call, therefore the container will be automatically removed when it exits. Please make sure that you persist all necessary data outside the container at runtime to avoid data loss.
By default kedro docker run
will use an image tagged as <project-root-dir>:latest
to create a container. If you renamed your image in the previous step, please also provide an --image
option with the corresponding image tag, for example: kedro docker run --image "my-custom-image:latest"
.
When calling kedro docker run
you can also pass any specific options for docker run
by providing --docker-args
option. Since --docker-args
may contain multiple arguments, it's a good idea to add quotation marks. For example, kedro docker run --docker-args="--env KEY=MYVALUE"
instructs Docker to set environment variable KEY
to MYVALUE
in the container. You can learn more about available options here.
All other CLI options will be appended to kedro run
command inside the container. For example, kedro docker run --parallel
will run kedro run --parallel
inside the container.
Options:
--image
- Docker image name to be used, defaults to project root directory name--docker-args
- optional string containing extra options fordocker run
command-h, --help
- show command help and exit- Any other options will be treated as
kedro run
command options.
Interactive development with Docker
In addition to kedro docker run
Kedro also supports the following commands:
kedro docker ipython
- Run IPython session inside the containerkedro docker jupyter notebook
- Start a Jupyter Notebook inside the containerkedro docker jupyter lab
- Start a Jupyter Lab inside the container
Options:
--image
- Docker image name to be used, defaults to project root directory name--docker-args
- optional string containing extra options fordocker run
command--port
- host port that a container's port will be mapped to, defaults to 8888. This option applies tokedro docker jupyter
commands only-h, --help
- show command help and exit- Any other options will be treated as corresponding
kedro
command CLI options. For example,kedro docker jupyter lab --NotebookApp.token='' --NotebookApp.password=''
will run Jupyter Lab server without the password and token.
Important: Please note that source code directory of a project (
src
folder) is not mounted to the Docker container by default. This means that if you change any code insrc
directory inside the container, those changes will not be saved to the host machine and will be completely lost when the container is terminated. In order to mount the whole project when running a Jupyter Lab, for example, run the following command:
kedro docker jupyter lab --docker-args "-v ${PWD}:/home/kedro"
Image analysis with Dive
Kedro-Docker allows to analyze the size efficiency of your project image by leveraging Dive:
kedro docker dive
Note: By default Kedro-Docker calls Dive in CI mode. If you want to explore your image in the UI, run
kedro docker dive --no-ci
.
Options:
--ci / --no-ci
- flag for running Dive in non-interactive mode. Defaults to true--ci-config-path
- path to Dive CI config file. Defaults to.dive-ci
in the project root directory--image
- Docker image name to be used, defaults to project root directory name--docker-args
- optional string containing extra options fordocker run
command-h, --help
- show command help and exit.
Running custom commands with Docker
You can also run an arbitrary command inside Docker container by executing kedro docker cmd <CMD>
, where <CMD>
corresponds to the command that you want to execute. If <CMD>
is not specified, this will execute kedro run
inside the container.
Note: If you are running
kedro
command, unlike in the previous sections, you should specify the whole command includingkedro
keyword. This is to allow the execution of non Kedro commands as well.
For example:
kedro docker cmd kedro test
will runkedro test
inside the containerkedro docker cmd
will runkedro run
inside the containerkedro docker cmd --docker-args="-it" /bin/bash
will create an interactivebash
shell in the container (and allocate a pseudo-TTY connected to the container’s stdin).
Options:
--image
- Docker image name to be used, defaults to project root directory name--docker-args
- optional string containing extra options fordocker run
command-h, --help
- show command help and exit.
Running Kedro-Docker with Kedro-Viz
These instructions allow you to access Kedro-Viz, Kedro's data pipeline visualisation tool, via Docker. In your terminal, run the following commands:
pip download -d data --no-deps kedro-viz
kedro docker build
kedro docker cmd bash --docker-args="-it -u=0 -p=4141:4141"
pip install data/*.whl
kedro viz --host=0.0.0.0 --no-browser
And then open 127.0.0.1:4141
in your preferred browser. Incidentally, if kedro-viz
is already installed in the Docker container (via requirements) then you can run:
kedro docker cmd --docker-args="-p=4141:4141" kedro viz --host=0.0.0.0
Can I contribute?
Yes! Want to help build Kedro-Docker? Check out our guide to contributing.
What licence do you use?
Kedro-Docker is licensed under the Apache 2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kedro-docker-0.4.0.tar.gz
.
File metadata
- Download URL: kedro-docker-0.4.0.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6df4a897964e1d928732dea7804901a29df13a54a62babec3705f4a02eceec55 |
|
MD5 | 239cd9ca3776531c6b79f482fd3189fc |
|
BLAKE2b-256 | 7696b69fde072eb2054c802f3cbf1964982e500d48233da08e8525b135057a4a |
File details
Details for the file kedro_docker-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: kedro_docker-0.4.0-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 742f670474111144bcce820e4b52f8150ea385209a869c46616f8629b9e184f6 |
|
MD5 | 35ed3f6c081e5512f7684d8b93c006a2 |
|
BLAKE2b-256 | cee4c309068994d94add065951fb9d50d61564fb601ba68d0ac0dc21046386a0 |