Skip to main content

Lineage and tracing for ML pipelines

Project description

mltrace

Documentation Status

This tool tracks data flow through various components in ML pipelines and contains a UI and API to show a trace of steps in an ML pipeline that produces an output. It consists of an ORM-backed database, helper functions for users to perform logging in their pipelines, and a UI for users to view metadata and trace outputs.

The prototype is very lofi, but this readme contains instructions on how to run the prototype on your machine.

screenshot

Quickstart

You should have Docker installed on your machine. To get started, you will need to do 3 things:

  1. Set up the database and Flask server
  2. Run some pipelines with logging
  3. Launch and UI

Database and server setup

We use Postgres-backed SQLAlchemy. Unfortunately the db uri is hardcoded in multiple files, which I will change at some point.

Assuming you have Docker installed, you can run the following commands from the root directory:

docker-compose build
docker-compose up

And then to tear down the containers, you can run docker-compose down.

Run pipelines

The files populate_db.py and populate_db_logging.py include some fake pipeline components with the relevant logging mechanisms. Pick one to run (I suggest populate_db.py) and run it by executing make run. To execute populate_db_logging.py you will need to run make logrun. Make will handle the dependencies.

Launch UI

To launch the UI, navigate to ./mltrace/server/ui and execute yarn install then yarn start. The UI is based on create-react-app. Hopefully navigating the UI is intuitive.

Commands supported in the UI

Command Description
recent Shows recent component runs, also the home page
history COMPONENT_NAME Shows history of runs for the component name. Defaults to 10 runs. Can specify number of runs by appending a positive integer to the command, like history etl 15
inspect COMPONENT_RUN_ID Shows info for that component run ID
trace OUTPUT_ID Shows a trace of steps for the output ID
tag TAG_NAME Shows all components with the tag name

Code organization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mltrace-0.1.tar.gz (15.0 kB view hashes)

Uploaded Source

Built Distribution

mltrace-0.1-py3-none-any.whl (21.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page