Skip to main content

A comprehensive, modern CLI toolkit that solves all major Jupyter notebook pain points in one unified interface.

Project description

nbctl

The Swiss Army Knife for Jupyter Notebooks

A comprehensive, production-ready CLI toolkit for Jupyter notebooks that solves all major pain points: version control, collaboration, code quality, security, and workflow automation.

Python License PyPI

Links

Features

  • Clean - Remove outputs and metadata for git
  • Info - Analyze notebook statistics and dependencies
  • Export - Convert to HTML, PDF, Markdown, Python, etc.
  • Extract - Extract outputs (images, graphs, data) from notebooks
  • ML-Split - Split ML notebooks into production Python pipelines
  • Run - Execute notebooks from command line
  • Lint - Check code quality and best practices
  • Format - Auto-format with black
  • Git Setup - Configure git for notebooks
  • Diff - Compare notebooks intelligently
  • Combine - Concatenate notebooks
  • Resolve - 3-way merge with conflict detection (powered by nbdime)
  • Security - Find security vulnerabilities

Installation

pip install nbctl

Or install from source:

git clone https://github.com/VenkatachalamSubramanianPeriyaSubbu/nbctl.git
cd nbctl
pip install -e .

Quick Start

Clean notebooks for git

nbctl clean notebook.ipynb

Removes: Outputs, execution counts, metadata Result: Smaller files, cleaner diffs, fewer conflicts

Get notebook insights

nbctl info notebook.ipynb

Shows: Statistics, code metrics, dependencies, imports

Scan for security issues

nbctl security notebook.ipynb

Detects: Hardcoded secrets, SQL injection, unsafe pickle, and more

Extract outputs from notebooks

nbctl extract notebook.ipynb

Extracts: Images (PNG, JPEG, SVG), data (JSON, CSV, DataFrames) Saves to: outputs/data/ and outputs/images/

Split ML notebook into Python pipeline

nbctl ml-split ml_notebook.ipynb
cd ml_pipeline && python main.py

Creates: Production-ready Python modules with automatic context passing

Compare notebooks

nbctl diff notebook1.ipynb notebook2.ipynb

Compares: Only source code (ignores outputs/metadata)

Resolve merge conflicts

nbctl resolve base.ipynb ours.ipynb theirs.ipynb -o merged.ipynb

Uses: nbdime's intelligent 3-way merge with conflict detection

๐Ÿ“š Commands Reference

nbutils clean

Remove outputs and metadata from notebooks for version control.

nbutils clean notebook.ipynb [OPTIONS]

Options:

  • --output, -o PATH - Save to different file
  • --keep-outputs - Preserve cell outputs
  • --keep-execution-count - Preserve execution counts
  • --keep-metadata - Preserve metadata
  • --dry-run - Preview changes without modifying

Examples:

# Clean in place
nbutils clean notebook.ipynb

# Preview changes
nbutils clean notebook.ipynb --dry-run

# Save to new file
nbutils clean notebook.ipynb -o clean.ipynb

nbutils info

Display comprehensive notebook statistics and analysis.

nbutils info notebook.ipynb [OPTIONS]

Options:

  • --code-metrics - Show only code metrics
  • --imports - Show only import statements

Shows:

  • Cell counts (code, markdown, raw)
  • File size
  • Code metrics (lines, complexity, empty cells)
  • All import statements and dependencies

Examples:

# Full analysis
nbutils info notebook.ipynb

# Just imports
nbutils info notebook.ipynb --imports

nbutils export

Convert notebooks to multiple formats simultaneously.

nbutils export notebook.ipynb --format FORMATS [OPTIONS]

Supported Formats:

  • html - HTML document
  • pdf - PDF (requires LaTeX)
  • markdown, md - Markdown
  • python, py - Python script
  • latex, tex - LaTeX
  • rst - reStructuredText
  • slides - Reveal.js presentations

Options:

  • --format, -f - Output formats (comma-separated, required)
  • --output-dir, -o - Output directory
  • --no-input - Exclude input cells
  • --no-prompt - Exclude prompts

Examples:

# Export to multiple formats
nbutils export notebook.ipynb -f html,pdf,py

# Export without input cells
nbutils export notebook.ipynb -f html --no-input

# Export presentation
nbutils export notebook.ipynb -f slides

nbutils extract

Extract outputs (images, graphs, data) from notebook cells.

nbutils extract notebook.ipynb [OPTIONS]

Features:

  • Extract data: JSON, CSV, HTML tables (DataFrames), text
  • Extract images: PNG, JPEG, SVG (matplotlib plots, graphs)
  • Organized folders: outputs/data/ and outputs/images/
  • Traceable filenames: cell_{idx}_output_{idx}_type_{counter}.ext

Options:

  • --output, -o PATH - Output directory (default: outputs/)
  • --data - Extract only data outputs
  • --images - Extract only image outputs
  • --all - Extract all outputs without prompting

Interactive Mode:

# Prompts you to choose: both/data/images/all
nbutils extract notebook.ipynb

Examples:

# Interactive mode
nbutils extract ml_analysis.ipynb

# Extract everything
nbutils extract ml_analysis.ipynb --all

# Only images (plots, graphs)
nbutils extract ml_analysis.ipynb --images

# Only data (CSV, JSON, DataFrames)
nbutils extract ml_analysis.ipynb --data

# Custom output directory
nbutils extract ml_analysis.ipynb --output my_outputs/

Output Structure:

outputs/
โ”œโ”€โ”€ data/
โ”‚ โ”œโ”€โ”€ cell_0_output_0_data_0.json
โ”‚ โ”œโ”€โ”€ cell_1_output_0_data_1.html # DataFrame
โ”‚ โ””โ”€โ”€ cell_2_output_0_data_2.csv
โ””โ”€โ”€ images/
 โ”œโ”€โ”€ cell_3_output_0_img_0.png # Matplotlib plot
 โ”œโ”€โ”€ cell_4_output_0_img_1.svg # Vector graphic
 โ””โ”€โ”€ cell_5_output_0_img_2.jpeg

nbutils ml-split

Split ML notebooks into production-ready Python pipeline modules.

nbutils ml-split notebook.ipynb [OPTIONS]

Features:

  • Intelligent section detection - Recognizes 7 ML workflow patterns
  • Context passing - Variables flow between pipeline steps
  • Complete package - Generates __init__.py + main.py runner
  • Auto-dependencies - Creates requirements.txt from imports

Detected Sections:

  • Data Collection
  • Data Preprocessing/Cleaning
  • Feature Engineering
  • Data Splitting (train/test)
  • Model Training
  • Model Evaluation
  • Model Saving

Options:

  • --output, -o PATH - Output directory (default: ml_pipeline/)
  • --create-main - Create main.py runner (default: True)

Examples:

# Split ML notebook into pipeline
nbutils ml-split ml_notebook.ipynb

# Custom output directory
nbutils ml-split ml_notebook.ipynb --output src/ml/

# Run the generated pipeline
cd ml_pipeline
python main.py

Generated Structure:

ml_pipeline/
โ”œโ”€โ”€ data_collection.py # Module for each section
โ”œโ”€โ”€ data_preprocessing.py
โ”œโ”€โ”€ feature_engineering.py
โ”œโ”€โ”€ data_splitting.py
โ”œโ”€โ”€ model_training.py
โ”œโ”€โ”€ model_evaluation.py
โ”œโ”€โ”€ model_saving.py
โ”œโ”€โ”€ __init__.py # Package init
โ”œโ”€โ”€ main.py # Pipeline runner
โ””โ”€โ”€ requirements.txt # Auto-generated deps

How It Works:

  1. Analyzes markdown headers in your notebook
  2. Groups code cells by ML workflow section
  3. Generates Python modules with run(context) functions
  4. Creates main.py that executes the entire pipeline
  5. Variables pass automatically between steps

Each Module:

def run(context=None):
 """Execute pipeline step with context from previous steps"""
 # Your notebook code here
 return locals() # Pass variables to next step

Main Pipeline:

# Executes all steps in sequence
context = data_collection.run()
context = data_preprocessing.run(context) # Gets 'df' from step 1
context = feature_engineering.run(context) # Gets 'df' from step 2
# ... and so on

nbutils run

Execute Jupyter notebooks from the command line.

nbutils run notebook1.ipynb notebook2.ipynb [OPTIONS]

Features:

  • Execute notebooks in specified or alphabetical order
  • No timeout by default (perfect for long ML training)
  • Save executed notebooks with all outputs
  • Detailed execution summary
  • Error handling and reporting

Options:

  • --order - Run notebooks in alphabetical order
  • --timeout, -t INT - Timeout per cell in seconds (default: None)
  • --allow-errors - Continue execution even if cells fail
  • --save-output, -o PATH - Directory to save executed notebooks
  • --kernel, -k TEXT - Kernel name to use (default: python3)

Examples:

# Run single notebook
nbutils run analysis.ipynb

# Run multiple notebooks in specified order
nbutils run 01_load.ipynb 02_process.ipynb 03_analyze.ipynb

# Run all notebooks alphabetically
nbutils run *.ipynb --order

# Save executed notebooks to directory
nbutils run *.ipynb --save-output executed/

# Continue on errors
nbutils run notebook.ipynb --allow-errors

# Set timeout for safety (e.g., prevent infinite loops)
nbutils run notebook.ipynb --timeout 600

Execution Summary:

Execution Summary

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Notebook        โ”ƒ Status  โ”ƒ Time  โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ 01_load.ipynb   โ”‚ Success โ”‚ 2.3s  โ”‚
โ”‚ 02_process.ipynbโ”‚ Success โ”‚ 5.1s  โ”‚
โ”‚ 03_analyze.ipynbโ”‚ Success โ”‚ 3.7s  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Total: 3 notebooks | Successful: 3 | Total time: 11.1s

Use Cases:

  • Execute ML training notebooks overnight
  • Run data pipelines in sequence
  • Automate report generation
  • Batch process multiple notebooks
  • CI/CD notebook testing

nbutils lint

Check code quality and identify issues.

nbutils lint notebook.ipynb [OPTIONS]

Checks:

  • Unused imports
  • Overly long cells
  • Empty code cells
  • Code quality issues

Options:

  • --max-cell-length INT - Max lines per cell (default: 100)

Examples:

# Standard linting
nbutils lint notebook.ipynb

# Custom cell length limit
nbutils lint notebook.ipynb --max-cell-length 150

nbutils format

Auto-format code cells with black.

nbutils format notebook.ipynb [OPTIONS]

Options:

  • --output-dir, -o - Output directory
  • --line-length INT - Max line length (default: 88)

Examples:

# Format in place
nbutils format notebook.ipynb

# Custom line length
nbutils format notebook.ipynb --line-length 100

nbutils git-setup

Configure git for optimal notebook workflows.

nbutils git-setup

Configures:

  • .gitattributes for notebook handling
  • .gitignore for Python projects
  • Custom diff driver using nbutils
  • Custom merge driver using nbutils

Run once per repository to enable git integration.


nbutils diff

Compare notebooks intelligently (ignores outputs and metadata).

nbutils diff notebook1.ipynb notebook2.ipynb [OPTIONS]

Options:

  • --format, -f - Output format: table, unified, json (default: table)
  • --code-only - Show only code cell changes
  • --stats - Show only statistics

Features:

  • Ignores outputs and metadata
  • Focuses on actual code changes
  • Multiple output formats

Examples:

# Table view (default)
nbutils diff old.ipynb new.ipynb

# Unified diff format
nbutils diff old.ipynb new.ipynb --format unified

# Show only code changes
nbutils diff old.ipynb new.ipynb --code-only

# JSON output for automation
nbutils diff old.ipynb new.ipynb --format json

nbutils combine

Concatenate or combine two notebooks.

nbutils combine notebook1.ipynb notebook2.ipynb -o output.ipynb [OPTIONS]

Strategies:

  • append - Concatenate all cells from both (default)
  • first - Keep only first notebook
  • second - Keep only second notebook

Options:

  • --output, -o - Output file (required)
  • --strategy - Combine strategy
  • --report - Show detailed report

Examples:

# Concatenate notebooks
nbutils combine analysis1.ipynb analysis2.ipynb -o full.ipynb

# Keep only first notebook (copy)
nbutils combine nb1.ipynb nb2.ipynb -o output.ipynb --strategy first

Note: For true merging with conflict detection, use nbutils resolve.


nbutils resolve

Intelligent 3-way merge with conflict detection (powered by nbdime).

nbutils resolve base.ipynb ours.ipynb theirs.ipynb -o merged.ipynb [OPTIONS]

Arguments:

  • BASE - Common ancestor (before changes)
  • OURS - Your version (local changes)
  • THEIRS - Other version (remote changes)

Options:

  • --output, -o - Output file (required unless --check-conflicts)
  • --strategy - Merge strategy: auto, ours, theirs, cell-append
  • --check-conflicts - Check for conflicts only (no output file needed)
  • --report - Show detailed merge report

Features:

  • Production-grade merging with nbdime
  • Automatic conflict detection
  • Conflict markers for manual resolution
  • Multiple merge strategies

Examples:

# Check for conflicts first
nbutils resolve base.ipynb ours.ipynb theirs.ipynb --check-conflicts

# Perform merge
nbutils resolve base.ipynb ours.ipynb theirs.ipynb -o merged.ipynb

# Use with Git
git show :1:notebook.ipynb > base.ipynb
git show :2:notebook.ipynb > ours.ipynb
git show :3:notebook.ipynb > theirs.ipynb
nbutils resolve base.ipynb ours.ipynb theirs.ipynb -o notebook.ipynb

nbutils security

Scan notebooks for security vulnerabilities.

nbutils security notebook.ipynb [OPTIONS]

Detects:

  • HIGH: Hardcoded secrets (API keys, passwords, tokens)
  • HIGH: Unsafe pickle deserialization
  • HIGH: SQL injection risks
  • MEDIUM: Command injection (os.system, eval, exec)
  • MEDIUM: Unsafe YAML parsing
  • MEDIUM: Disabled SSL verification
  • LOW: Weak cryptographic algorithms (MD5, SHA1)

Options:

  • --severity - Filter by severity: low, medium, high, all (default: all)
  • --json - Output as JSON
  • --verbose, -v - Show detailed recommendations

Examples:

# Scan for all issues
nbutils security notebook.ipynb

# Only high severity
nbutils security notebook.ipynb --severity high

# With recommendations
nbutils security notebook.ipynb --verbose

# JSON output for CI/CD
nbutils security notebook.ipynb --json

Common Workflows

Setting up a new repository

# 1. Configure git for notebooks
nbutils git-setup

# 2. Clean notebooks before committing
nbutils clean *.ipynb

# 3. Check code quality
nbutils lint notebook.ipynb
nbutils format notebook.ipynb

# 4. Scan for security issues
nbutils security notebook.ipynb

Reviewing notebook changes

# Compare versions
nbutils diff old.ipynb new.ipynb --format unified

# Check what changed (code only)
nbutils diff old.ipynb new.ipynb --code-only

Resolving merge conflicts

# Check if there are conflicts
nbutils resolve base.ipynb ours.ipynb theirs.ipynb --check-conflicts

# Perform merge
nbutils resolve base.ipynb ours.ipynb theirs.ipynb -o merged.ipynb --report

# If conflicts exist, manually resolve in the merged file

Pre-commit checks

# Quality checks
nbutils lint notebook.ipynb
nbutils format notebook.ipynb
nbutils security notebook.ipynb --severity high

# Clean for commit
nbutils clean notebook.ipynb

ML Workflow - From Notebook to Production

# 1. Develop ML model in notebook
# (work on ml_model.ipynb)

# 2. Extract outputs for reports
nbutils extract ml_model.ipynb --images
# โ†’ Gets all plots and visualizations

# 3. Split into production pipeline
nbutils ml-split ml_model.ipynb --output ml_pipeline/

# 4. Test the pipeline
cd ml_pipeline
python main.py

# 5. Deploy the pipeline modules
# Each module is a standalone Python file ready for production!

Development

Setup

# Clone repository
git clone https://github.com/yourusername/nbutils.git
cd nbutils

# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

Run Tests

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_security.py -v

# With coverage
pytest tests/ --cov=nbutils --cov-report=html

Code Quality

# Format code
black nbutils/ tests/

# Type checking
mypy nbutils/

Why nbutils?

Jupyter notebooks are powerful but have challenges:

Problem nbutils Solution
Massive git diffs clean - Remove outputs
Merge conflicts resolve - Intelligent 3-way merge
Hard to compare diff - Smart comparison
Code quality issues lint + format
Security risks security - Vulnerability scanning
Manual workflows Comprehensive CLI automation

One tool. All solutions. Production-ready.


Roadmap

  • Basic clean command
  • Info command (statistics, metrics, imports)
  • Export command (HTML, PDF, Markdown, etc.)
  • Extract command (extract outputs, images, data)
  • ML-Split command (ML notebook โ†’ Python pipeline)
  • Lint command (code quality)
  • Format command (black auto-format)
  • Git setup (integration)
  • Diff command (intelligent comparison)
  • Combine command (2-way merge)
  • Resolve command (3-way merge with nbdime)
  • Security command (vulnerability scanning)
  • Test runner (execute and validate)
  • Split command (general notebook splitting)
  • Template system
  • Cloud integration

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

MIT License - see LICENSE file for details.


Author

Built with for the Jupyter community by Venkatachalam Subramanian Periya Subbu


Status

Version: 0.1.2 Status: Production-ready with comprehensive test coverage New: Extract outputs & ML pipeline splitting


** Star this repo if you find it useful!**

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbctl-0.1.3.tar.gz (49.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nbctl-0.1.3-py3-none-any.whl (43.1 kB view details)

Uploaded Python 3

File details

Details for the file nbctl-0.1.3.tar.gz.

File metadata

  • Download URL: nbctl-0.1.3.tar.gz
  • Upload date:
  • Size: 49.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for nbctl-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0499ac2106509b565529fae19a3d5387de95fd11b4e327478611bd749bd296c7
MD5 519da4dcb012180b1d8a5ede4d74a676
BLAKE2b-256 39557fcfb2f07b983cdf137aeba4e0407710edbfd72a1c10bcd93ac07b113182

See more details on using hashes here.

File details

Details for the file nbctl-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: nbctl-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 43.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for nbctl-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d0853600a1ea1cf79817df9f71a742774ede013852cf94c5480202c06815dfc4
MD5 2e600f523e20679c584acb2751eb8149
BLAKE2b-256 d1d0eaa47e8ea0ec2f1595763371dee7448d21fb2b6ae9aa217d0ac12178cc63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page