Track Jupyter notebook cell execution and export a clean, ordered Python script
Project description
jupytertracker
Part of an end-to-end ML model management system for replicable machine learning.
The problem
Building a machine learning model in a Jupyter notebook is iterative and messy — cells run out of order, code gets modified and re-run, hyperparameters get tweaked. When a model reviewer asks "how did you build this?", the data scientist has to manually reconstruct the process. When a compliance team asks for documentation, someone has to write it by hand.
The result: models that can't be independently replicated, and whitepapers that are written after the fact from memory rather than from the actual process.
System vision
This library is Component 1 of a three-part system for making the ML modeling process fully replicable and auditable:
┌─────────────────────────────────────────────────────────────────┐
│ ML Model Management System │
├──────────────────┬──────────────────────┬───────────────────────┤
│ Component 1 │ Component 2 │ Component 3 │
│ JupyterTracker │ MLflow Integration │ Whitepaper Generator │
│ (this library) │ │ │
├──────────────────┼──────────────────────┼───────────────────────┤
│ Records every │ Registers models, │ Generates a structured│
│ cell execution │ tracks experiments, │ report (data, method, │
│ in order. Exports│ parameters, metrics, │ results, limitations) │
│ an honest Python │ and serves models. │ from code annotations │
│ script of what │ Uses MLflow as-is. │ using an LLM. │
│ actually ran. │ │ │
├──────────────────┴──────────────────────┴───────────────────────┤
│ Together: a non-technical reviewer can verify what was built, │
│ how it was built, and reproduce the result independently. │
└─────────────────────────────────────────────────────────────────┘
Data flow:
Notebook session
│
├── JupyterTracker records every cell execution (parallel, live)
│ └── export_script() → ordered .py file with timing
│
├── MLflow tracks experiments, parameters, and metrics (parallel, live)
│ └── model registry → reproducible run IDs
│
└── On demand: Whitepaper generator
├── pulls execution log from JupyterTracker
├── pulls run metadata from MLflow
└── uses wpr_-prefixed function outputs as report sections
└── LLM assembles → structured whitepaper (PDF/Markdown)
Component 1: JupyterTracker
Track Jupyter notebook cell executions and export a clean, ordered Python script — exactly what ran, in the order it ran.
Install
pip install jupytertracker
Usage
Add one line at the top of your notebook:
import jupytertracker
jupytertracker.start()
When you're done, export:
jupytertracker.export_script("my_analysis.py")
The output is a .py file with every cell execution in order, one block per run:
# Generated by jupytertracker (sequential mode)
# Total execution time: 2m 14.3s
# Cells recorded: 5
# execution 1 [340ms]
x = load_data("train.csv")
# execution 2 [1m 52.1s]
model = train(x, lr=0.01)
# execution 3 [18.4s]
evaluate(model)
# execution 4 (re-run) [1m 48.7s]
model = train(x, lr=0.1)
# execution 5 (re-run) [15.1s]
evaluate(model)
API
jupytertracker.start(ip=None) # start tracking; idempotent
jupytertracker.stop() # stop tracking; next start() begins fresh
jupytertracker.export_script(path) # write execution log to .py file
jupytertracker.clear() # clear the log without stopping
jupytertracker.get_log() # return list of ExecutionRecord
Notes
-
Call
start()in your very first cell, before any imports or data loading. The tracker only records what runs afterstart()is called. Any state built up before — loaded dataframes, imported libraries, defined variables — is invisible to the tracker and will be missing from the exported script. -
The exported script is an execution record, not a guaranteed reproducible script. If cells depended on state that existed in the kernel but wasn't captured (see above), the script will fail with a
NameErrorwhen run top-to-bottom. -
Failed cells are excluded. Cells that raise an exception, have a syntax error, or are interrupted by the user are not recorded — only successful executions appear in the output.
-
Kernel restart resets tracking automatically (Python state is cleared). Call
export_script()before restarting if you want to preserve the session. -
Magic commands (
%matplotlib inline,!pip install ...) are included with a comment noting they require a Jupyter environment.
Related projects
- ipyflow — reactive Python kernel that tracks dataflow between cells and can recover the minimal set of cells needed to reproduce an output. Requires switching kernels; takes a "prevent the mess" approach vs. jupytertracker's "record the mess" approach.
- papermill — parameterizes and executes notebooks top-to-bottom. Good for batch runs; doesn't handle interactive out-of-order execution.
- reprozip-jupyter — packs the full notebook environment (libraries, data) for portability. Solves environment reproducibility, not execution-order reproducibility.
- MLflow — experiment tracking, model registry, and model serving. Component 2 of this system.
Roadmap
- v2:
mode='dedup'— deduplicate to the last version of each cell, ordered by last execution. For "clean up my notebook" workflows. - Component 2: MLflow integration — link JupyterTracker sessions to MLflow run IDs automatically.
- Component 3: Whitepaper generator —
wpr_-prefixed functions collect outputs for LLM-generated structured reports.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jupytertracker-0.1.0.tar.gz.
File metadata
- Download URL: jupytertracker-0.1.0.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23bfb5641fc2dc1f7b9b91fd9e3455393ca4bd16d3fa1946601031acdaf359a5
|
|
| MD5 |
0aed6ff60b30da0063cad7d905075488
|
|
| BLAKE2b-256 |
f5ed00a4405529ada1349537aedbdb61574c7659a6e69a02e7ac0baf48cf53f0
|
File details
Details for the file jupytertracker-0.1.0-py3-none-any.whl.
File metadata
- Download URL: jupytertracker-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec194374a84e355c8a401600203c21e62bffbb646b7b30527fd7280d87456054
|
|
| MD5 |
824b6762352089f2e8b88033f0b6ca57
|
|
| BLAKE2b-256 |
8e7782a86a588ad6da247a6f5e72e8d16da6003af102c5b76f0e9d5782d206a3
|