Lightweight ML Deployment Platform
productionize - deploy ML models directly from Python [WIP]
productionize is an open-source lightweight ML deployment tool.
You can containerize, deploy and ship your model, without ever
having to leave your beloved Python.
productionize in a nutshell
What does it do? Well, it does exactly what the catchy library name says it does.
productionize helps you to productionize your API. As a Data Scientist, most of the projects I worked on face issue when productionizing code. Often, the code is not tested, standardized or environment agnostic enough to just deploy something somewhere. This where containers come in very handy. Containerization helps you to freeze the environment and decouple your model or just your code from the host system. This makes deployment much, much easier.
However, working with Docker, Kubernetes and all these other fancy tools is not as simple as one might hope. The good news though, some steps can be automated and this is exactly what
productionize does. As Data Scientist you can focus on your model and the containerization and deployment is handled by
The workflow with
productionize is very simple. First, you develop your API in Python. Next,
productionize allows you easily setup a local Kubernetes cluster that allows you to test your API. In
productionize, this local Kubernetes cluster is called a workbench, because it is Kubernetes, with a little extra stuff to help you work. Next, you deploy your API. You don't have to change your standard API script for that,
productionize will handle that for you. Within a matter of seconds, your API is built into a container and deployed to your workbench. Here you can test your API and see if it works. If you are happy with it, you can simply export the container and deploy it to any Kubernetes cluster you like.
productionize makes it super easy to turn your local API into a production-ready container and the best part: you don't even have to leave Python.
productionize is a Python library, which is hosted on PyPi. Currently, the functions are only supported on macOS. On the darwin platform you can therefore download the package using
pip install productionize
Once the library is properly installed from PyPi, you can source it using your standard python import command. The core of the library are it's three main classes, those can be imported as follows:
# import lib from productionize import workbench, product
The library contains two major classes. The first one is the
workbench() class. This class allows you to setup and manage a proper ML workbench on your local machine. The second one is the
product() class. This class allows you to deploy your ML APIs to the local workbench and to any Kubernetes cluster.
Once the main classes are sourced, you can setup your very own workbench on your local machine. The workbench consists of several tools:
- Docker: a container technology, which helps us to build Docker container, which are the quasi-standard in Machine Learning deployment. You can read more about Docker here.
- VirtualBox: a driver that is needed to create a VM on you local machine to host the Kubernetes cluster, which is at the heart of the workbench. You can read more about VirtualBox here
- Kubectl: a cli which allows you to interact with Kubernetes. You won't have to do that, but
productionizeis running Kubernetes commands in the background.
- Minikube: a local implemenation of Kubernetes. Minikube runs on a VM, which is administrated by Virtualbox.
Technically, the components are ensembled in a simple fashion. However, the only specialty is, that Minikube is installed on top of VirtualBox.
To setup the workbench, these tools need to be installed. You can do this, by simple running the
setup() method of the workbench class. Once initiated you can call the method.
# initiate class cluster = workbench() # install and setup components cluster.setup()
To fire up the entire workbench, you first need to login to Docker Desktop. This is installed for you, however, you need to have it running. You can easily do this, just search on your computer - if you have a Mac you just use spotlight search - for Docker and start the application.
Next you will have to sign in. If you don't have an account already, you can create one for free at Docker Hub. Which is a lot like GitHub, just for containers.
Once you did this, you are good to go on. You can now start the cluster using the
start_cluster() method. This method allows you to set the resource quota for the cluster. Default are two CPUs and 2GB of memory.
# start the cluster cluster.start_cluster(cpus = '2', memory = '2G')
When the cluster is running, you can create a project. This helps to have a clean and well-structured cluster running. You can do this with the
# open project cluster.open_project(name = "my-project")
In case you want to delete the project you can use the
delete_project() method. Technically, the projects are namespaces on Kubernetes.
# delete project cluster.delete_project(name = "my-project")
To stop the cluster you can simply use the
stop_cluster() method. This one just idles the cluster, but doesn't remove all the components.
# stop the cluster cluster.stop_cluster()
To cleanly uninstall all the components, you can just run the
uninstall() method and even specify which components to delete. The default is, that the components that existed on your machine before will be not removed.
# cleanly uninstall cluster components cluster.uninstall(docker = None, kubectl = None, virtualbox = None, minikube = None, report = True)
workbench class mainly concerns the infrastructure management, the
product class deals with your API. The
product class turns your API into a deployable product. Once you have an API programmed, for instance with Flask, the
product class will do the rest for you.
Let's consider the following python script containing a Flask API:
#!flask/bin/python from flask import Flask app = Flask(__name__) @app.route('/hello') def index(): return "Hello, World!" if __name__ == '__main__': app.run(port = '8000', host = '0.0.0.0')
You can, of course, create any kind of API you like. You can also add new routes or whatever you need. To deploy an API to Kubernetes, you would typically need to containerize the API.
productionize does that for you. The
product class contains the
prepare_deployment() method. This method produces a Dockerfile from your API script and a requirements file.
# initiate the class and say which project the product belongs to my_api = product(name = "my-product", project = "my-project") # prepare the deployment my_api.prepare_deployment(api_file = "path_to/api.py", # path to the api file requirements_file = "path_to/requirements.txt", # path to the req file port = "8000") # the port your API is exposed to
Note: I would advise to not do any directory stunts here. The code in this library is flexible, however, it might be a bit tricky.
Once you run the
productionize will build a Dockerfile in your current working directory.
You can, of course, modify and edit the Dockerfile. However, at your own risk. If you intend to work in an enterprise context it might be necessary to change permissions within the container. This does not have an effect on
productionize. Per default,
productionize containers run with root.
FROM python:3.7.7 RUN mkdir -p /api COPY api.py /api/api.py COPY requirements.txt /api/requirements.txt RUN python -m pip install -r /api/requirements.txt EXPOSE 8000 ENTRYPOINT ["python", "api/api.py"]
Once you ran the
prepare_deployment() method, you can deploy your api to the workbench. Why would you do this? Well, the workbench should serve as your local test environment. Using the deploy() method, you can easily deploy your "product" to the workbench.
deploy() does not take any arguments. Those are not necessary as all info is stored in the my_api object after
prepare_deployment. However, if you want, you can also deploy your product on your localhost.
Technically speaking, this will just create Docker container that runs on localhost. This can be acheived with the local arg
in the method call.
my_api.deploy(local = True)
Once your product is deployed, the method will return the url under which you can reach your API. However, don't forget to add your custom routes.
Your output should look somewhat like this:
>>> my_api.deploy() Deployment Report: ------------------ This is an automatically generated report on the status of your deployed product. Your API is now containerized and hosted on the workbench. You can access the API using: http://XXX.XXX.XX.XXX:XXXXX/<your_route> You can call the API in whatever way it is designed. If you want to get rid of it, just use the delete_deployment() method. If you just want to update the API, you can just use prepare_deployment() to create a new Dockerfile and then deploy() again. Your Product ----------------------- Name: my-product Project: my-project Status: deployed and healthy Access: http://XXX.XXX.XX.XXX:XXXXX/<your_route> If you want to export the image to your local machine just use the export_product() method. If you want to push it to another registry, you can use the push_product() method.
Now you know how to reach your API. In case you find out it doesn't work and you change something on the code, you can just re-run
prepare_deployment() and then
deploy() will automatically realize that the "product" has already been deployed and will just update the existing one. In case you want to delete a product, you can just use the
delete_deployment() method. This will also work for local deployments.
# delete product my_api.delete_deployment(product = "my-product", project = "my-project")
When you are satisfied with your API, you might want to deploy or ship it to an enterprise-ready or collaborative cluster. As the workbench is at the heart a Kubernetes cluster, everything you do on the workbench, will work on any other cluster. To give you the freedom of choice,
productionize implements a method to deploy anywhere.
This is the
push_product() method. This method pushes the product in form of a Docker image to any registry you want. Default is DockerHub. However, you can select any registry you like. In case of secure registries, you will need credentials or a token. Those will be asked from you with a prompt.
# push the product my_api.push_product(product = "my-product", registry = "my.registry:5000/image-name")
This method will automatically tag the image and run
docker push to push the image to any remote industry.
productionize is far from ready and is still work in progress. I started this project around mid of May 2020, when I was super annoyed when I had to built up a new test cluster on my local machine, cause I messed up the others too much. As this all started with me sitting on my Mac, this project is at the moment only stable on macOS. I already started to work on other UNIX systems, however Windows might take a bit of time. So the next steps are the following:
- Functional Features:
- Ease the export of products from workbench to local machine
- Integrate the push feature to external cluster registries
- Non-functional Features:
- Update unit testing for product() class
- Functional Features:
- Add workbench management feature
- Non-functional Features:
- Support latest Ubuntu version
- Support latest CentOS version
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for productionize-0.1.0-py3-none-any.whl