Track Jupyter notebook cell execution and export a clean, ordered Python script

Project description

jupytertracker

Part of an end-to-end ML model management system for replicable machine learning.

The problem

Building a machine learning model in a Jupyter notebook is iterative and messy — cells run out of order, code gets modified and re-run, hyperparameters get tweaked. When a model reviewer asks "how did you build this?", the data scientist has to manually reconstruct the process. When a compliance team asks for documentation, someone has to write it by hand.

The result: models that can't be independently replicated, and whitepapers that are written after the fact from memory rather than from the actual process.

System vision

This library is Component 1 of a three-part system for making the ML modeling process fully replicable and auditable:

┌─────────────────────────────────────────────────────────────────┐
│                  ML Model Management System                      │
├──────────────────┬──────────────────────┬───────────────────────┤
│  Component 1     │  Component 2         │  Component 3          │
│  JupyterTracker  │  MLflow Integration  │  Whitepaper Generator │
│  (this library)  │                      │                        │
├──────────────────┼──────────────────────┼───────────────────────┤
│ Records every    │ Registers models,    │ Generates a structured│
│ cell execution   │ tracks experiments,  │ report (data, method, │
│ in order. Exports│ parameters, metrics, │ results, limitations) │
│ an honest Python │ and serves models.   │ from code annotations │
│ script of what   │ Uses MLflow as-is.   │ using an LLM.         │
│ actually ran.    │                      │                        │
├──────────────────┴──────────────────────┴───────────────────────┤
│  Together: a non-technical reviewer can verify what was built,  │
│  how it was built, and reproduce the result independently.      │
└─────────────────────────────────────────────────────────────────┘

Data flow:

Notebook session
  │
  ├── JupyterTracker records every cell execution (parallel, live)
  │     └── export_script() → ordered .py file with timing
  │
  ├── MLflow tracks experiments, parameters, and metrics (parallel, live)
  │     └── model registry → reproducible run IDs
  │
  └── On demand: Whitepaper generator
        ├── pulls execution log from JupyterTracker
        ├── pulls run metadata from MLflow
        └── uses wpr_-prefixed function outputs as report sections
              └── LLM assembles → structured whitepaper (PDF/Markdown)

Component 1: JupyterTracker

Track Jupyter notebook cell executions and export a clean, ordered Python script — exactly what ran, in the order it ran.

Install

pip install jupytertracker

Usage

Add one line at the top of your notebook:

import jupytertracker
jupytertracker.start()

When you're done, export:

jupytertracker.export_script("my_analysis.py")

The output is a .py file with every cell execution in order, one block per run:

# Generated by jupytertracker (sequential mode)
# Total execution time: 2m 14.3s
# Cells recorded: 5

# execution 1  [340ms]
x = load_data("train.csv")

# execution 2  [1m 52.1s]
model = train(x, lr=0.01)

# execution 3  [18.4s]
evaluate(model)

# execution 4 (re-run)  [1m 48.7s]
model = train(x, lr=0.1)

# execution 5 (re-run)  [15.1s]
evaluate(model)

API

jupytertracker.start(ip=None)        # start tracking; idempotent
jupytertracker.stop()                # stop tracking; next start() begins fresh
jupytertracker.export_script(path)   # write execution log to .py file
jupytertracker.clear()               # clear the log without stopping
jupytertracker.get_log()             # return list of ExecutionRecord

Notes

Call start() in your very first cell, before any imports or data loading. The tracker only records what runs after start() is called. Any state built up before — loaded dataframes, imported libraries, defined variables — is invisible to the tracker and will be missing from the exported script.
The exported script is an execution record, not a guaranteed reproducible script. If cells depended on state that existed in the kernel but wasn't captured (see above), the script will fail with a NameError when run top-to-bottom.
Failed cells are excluded. Cells that raise an exception, have a syntax error, or are interrupted by the user are not recorded — only successful executions appear in the output.
Kernel restart resets tracking automatically (Python state is cleared). Call export_script() before restarting if you want to preserve the session.
Magic commands (%matplotlib inline, !pip install ...) are included with a comment noting they require a Jupyter environment.

Related projects

ipyflow — reactive Python kernel that tracks dataflow between cells and can recover the minimal set of cells needed to reproduce an output. Requires switching kernels; takes a "prevent the mess" approach vs. jupytertracker's "record the mess" approach.
papermill — parameterizes and executes notebooks top-to-bottom. Good for batch runs; doesn't handle interactive out-of-order execution.
reprozip-jupyter — packs the full notebook environment (libraries, data) for portability. Solves environment reproducibility, not execution-order reproducibility.
MLflow — experiment tracking, model registry, and model serving. Component 2 of this system.

Roadmap

v2: mode='dedup' — deduplicate to the last version of each cell, ordered by last execution. For "clean up my notebook" workflows.
Component 2: MLflow integration — link JupyterTracker sessions to MLflow run IDs automatically.
Component 3: Whitepaper generator — wpr_-prefixed functions collect outputs for LLM-generated structured reports.

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jun 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jupytertracker-0.1.0.tar.gz (8.6 kB view details)

Uploaded Jun 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jupytertracker-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Jun 6, 2026 Python 3

File details

Details for the file jupytertracker-0.1.0.tar.gz.

File metadata

Download URL: jupytertracker-0.1.0.tar.gz
Upload date: Jun 6, 2026
Size: 8.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for jupytertracker-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`23bfb5641fc2dc1f7b9b91fd9e3455393ca4bd16d3fa1946601031acdaf359a5`
MD5	`0aed6ff60b30da0063cad7d905075488`
BLAKE2b-256	`f5ed00a4405529ada1349537aedbdb61574c7659a6e69a02e7ac0baf48cf53f0`

See more details on using hashes here.

File details

Details for the file jupytertracker-0.1.0-py3-none-any.whl.

File metadata

Download URL: jupytertracker-0.1.0-py3-none-any.whl
Upload date: Jun 6, 2026
Size: 6.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for jupytertracker-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ec194374a84e355c8a401600203c21e62bffbb646b7b30527fd7280d87456054`
MD5	`824b6762352089f2e8b88033f0b6ca57`
BLAKE2b-256	`8e7782a86a588ad6da247a6f5e72e8d16da6003af102c5b76f0e9d5782d206a3`

See more details on using hashes here.

jupytertracker 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

jupytertracker

The problem

System vision

Component 1: JupyterTracker

Install

Usage

API

Notes

Related projects

Roadmap

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes