A comprehensive, modern CLI toolkit that solves all major Jupyter notebook pain points in one unified interface.
Project description
nbctl
The Swiss Army Knife for Jupyter Notebooks
A comprehensive, production-ready CLI toolkit for Jupyter notebooks that solves all major pain points: version control, collaboration, code quality, security, and workflow automation.
Links
- PyPI: https://pypi.org/project/nbctl/
- Documentation: https://venkatachalamsubramanianperiyasubbu.github.io/nbctl/
- GitHub: https://github.com/VenkatachalamSubramanianPeriyaSubbu/nbctl
Features
- Clean - Remove outputs and metadata for git
- Info - Analyze notebook statistics and dependencies
- Export - Convert to HTML, PDF, Markdown, Python, etc.
- Extract - Extract outputs (images, graphs, data) from notebooks
- ML-Split - Split ML notebooks into production Python pipelines
- Run - Execute notebooks from command line
- Lint - Check code quality and best practices
- Format - Auto-format with black
- Git Setup - Configure git for notebooks
- Diff - Compare notebooks intelligently
- Combine - Concatenate notebooks
- Resolve - 3-way merge with conflict detection (powered by nbdime)
- Security - Find security vulnerabilities
Installation
pip install nbctl
Or install from source:
git clone https://github.com/VenkatachalamSubramanianPeriyaSubbu/nbctl.git
cd nbctl
pip install -e .
Quick Start
Clean notebooks for git
nbctl clean notebook.ipynb
Removes: Outputs, execution counts, metadata Result: Smaller files, cleaner diffs, fewer conflicts
Get notebook insights
nbctl info notebook.ipynb
Shows: Statistics, code metrics, dependencies, imports
Scan for security issues
nbctl security notebook.ipynb
Detects: Hardcoded secrets, SQL injection, unsafe pickle, and more
Extract outputs from notebooks
nbctl extract notebook.ipynb
Extracts: Images (PNG, JPEG, SVG), data (JSON, CSV, DataFrames)
Saves to: outputs/data/ and outputs/images/
Split ML notebook into Python pipeline
nbctl ml-split ml_notebook.ipynb
cd ml_pipeline && python main.py
Creates: Production-ready Python modules with automatic context passing
Compare notebooks
nbctl diff notebook1.ipynb notebook2.ipynb
Compares: Only source code (ignores outputs/metadata)
Resolve merge conflicts
nbctl resolve base.ipynb ours.ipynb theirs.ipynb -o merged.ipynb
Uses: nbdime's intelligent 3-way merge with conflict detection
๐ Commands Reference
nbutils clean
Remove outputs and metadata from notebooks for version control.
nbutils clean notebook.ipynb [OPTIONS]
Options:
--output, -o PATH- Save to different file--keep-outputs- Preserve cell outputs--keep-execution-count- Preserve execution counts--keep-metadata- Preserve metadata--dry-run- Preview changes without modifying
Examples:
# Clean in place
nbutils clean notebook.ipynb
# Preview changes
nbutils clean notebook.ipynb --dry-run
# Save to new file
nbutils clean notebook.ipynb -o clean.ipynb
nbutils info
Display comprehensive notebook statistics and analysis.
nbutils info notebook.ipynb [OPTIONS]
Options:
--code-metrics- Show only code metrics--imports- Show only import statements
Shows:
- Cell counts (code, markdown, raw)
- File size
- Code metrics (lines, complexity, empty cells)
- All import statements and dependencies
Examples:
# Full analysis
nbutils info notebook.ipynb
# Just imports
nbutils info notebook.ipynb --imports
nbutils export
Convert notebooks to multiple formats simultaneously.
nbutils export notebook.ipynb --format FORMATS [OPTIONS]
Supported Formats:
html- HTML documentpdf- PDF (requires LaTeX)markdown,md- Markdownpython,py- Python scriptlatex,tex- LaTeXrst- reStructuredTextslides- Reveal.js presentations
Options:
--format, -f- Output formats (comma-separated, required)--output-dir, -o- Output directory--no-input- Exclude input cells--no-prompt- Exclude prompts
Examples:
# Export to multiple formats
nbutils export notebook.ipynb -f html,pdf,py
# Export without input cells
nbutils export notebook.ipynb -f html --no-input
# Export presentation
nbutils export notebook.ipynb -f slides
nbutils extract
Extract outputs (images, graphs, data) from notebook cells.
nbutils extract notebook.ipynb [OPTIONS]
Features:
- Extract data: JSON, CSV, HTML tables (DataFrames), text
- Extract images: PNG, JPEG, SVG (matplotlib plots, graphs)
- Organized folders:
outputs/data/andoutputs/images/ - Traceable filenames:
cell_{idx}_output_{idx}_type_{counter}.ext
Options:
--output, -o PATH- Output directory (default: outputs/)--data- Extract only data outputs--images- Extract only image outputs--all- Extract all outputs without prompting
Interactive Mode:
# Prompts you to choose: both/data/images/all
nbutils extract notebook.ipynb
Examples:
# Interactive mode
nbutils extract ml_analysis.ipynb
# Extract everything
nbutils extract ml_analysis.ipynb --all
# Only images (plots, graphs)
nbutils extract ml_analysis.ipynb --images
# Only data (CSV, JSON, DataFrames)
nbutils extract ml_analysis.ipynb --data
# Custom output directory
nbutils extract ml_analysis.ipynb --output my_outputs/
Output Structure:
outputs/
โโโ data/
โ โโโ cell_0_output_0_data_0.json
โ โโโ cell_1_output_0_data_1.html # DataFrame
โ โโโ cell_2_output_0_data_2.csv
โโโ images/
โโโ cell_3_output_0_img_0.png # Matplotlib plot
โโโ cell_4_output_0_img_1.svg # Vector graphic
โโโ cell_5_output_0_img_2.jpeg
nbutils ml-split
Split ML notebooks into production-ready Python pipeline modules.
nbutils ml-split notebook.ipynb [OPTIONS]
Features:
- Intelligent section detection - Recognizes 7 ML workflow patterns
- Context passing - Variables flow between pipeline steps
- Complete package - Generates
__init__.py+main.pyrunner - Auto-dependencies - Creates
requirements.txtfrom imports
Detected Sections:
- Data Collection
- Data Preprocessing/Cleaning
- Feature Engineering
- Data Splitting (train/test)
- Model Training
- Model Evaluation
- Model Saving
Options:
--output, -o PATH- Output directory (default: ml_pipeline/)--create-main- Create main.py runner (default: True)
Examples:
# Split ML notebook into pipeline
nbutils ml-split ml_notebook.ipynb
# Custom output directory
nbutils ml-split ml_notebook.ipynb --output src/ml/
# Run the generated pipeline
cd ml_pipeline
python main.py
Generated Structure:
ml_pipeline/
โโโ data_collection.py # Module for each section
โโโ data_preprocessing.py
โโโ feature_engineering.py
โโโ data_splitting.py
โโโ model_training.py
โโโ model_evaluation.py
โโโ model_saving.py
โโโ __init__.py # Package init
โโโ main.py # Pipeline runner
โโโ requirements.txt # Auto-generated deps
How It Works:
- Analyzes markdown headers in your notebook
- Groups code cells by ML workflow section
- Generates Python modules with
run(context)functions - Creates main.py that executes the entire pipeline
- Variables pass automatically between steps
Each Module:
def run(context=None):
"""Execute pipeline step with context from previous steps"""
# Your notebook code here
return locals() # Pass variables to next step
Main Pipeline:
# Executes all steps in sequence
context = data_collection.run()
context = data_preprocessing.run(context) # Gets 'df' from step 1
context = feature_engineering.run(context) # Gets 'df' from step 2
# ... and so on
nbutils run
Execute Jupyter notebooks from the command line.
nbutils run notebook1.ipynb notebook2.ipynb [OPTIONS]
Features:
- Execute notebooks in specified or alphabetical order
- No timeout by default (perfect for long ML training)
- Save executed notebooks with all outputs
- Detailed execution summary
- Error handling and reporting
Options:
--order- Run notebooks in alphabetical order--timeout, -t INT- Timeout per cell in seconds (default: None)--allow-errors- Continue execution even if cells fail--save-output, -o PATH- Directory to save executed notebooks--kernel, -k TEXT- Kernel name to use (default: python3)
Examples:
# Run single notebook
nbutils run analysis.ipynb
# Run multiple notebooks in specified order
nbutils run 01_load.ipynb 02_process.ipynb 03_analyze.ipynb
# Run all notebooks alphabetically
nbutils run *.ipynb --order
# Save executed notebooks to directory
nbutils run *.ipynb --save-output executed/
# Continue on errors
nbutils run notebook.ipynb --allow-errors
# Set timeout for safety (e.g., prevent infinite loops)
nbutils run notebook.ipynb --timeout 600
Execution Summary:
Execution Summary
โโโโโโโโโโโโโโโโโโโณโโโโโโโโโโณโโโโโโโโ
โ Notebook โ Status โ Time โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ 01_load.ipynb โ Success โ 2.3s โ
โ 02_process.ipynbโ Success โ 5.1s โ
โ 03_analyze.ipynbโ Success โ 3.7s โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโ
Total: 3 notebooks | Successful: 3 | Total time: 11.1s
Use Cases:
- Execute ML training notebooks overnight
- Run data pipelines in sequence
- Automate report generation
- Batch process multiple notebooks
- CI/CD notebook testing
nbutils lint
Check code quality and identify issues.
nbutils lint notebook.ipynb [OPTIONS]
Checks:
- Unused imports
- Overly long cells
- Empty code cells
- Code quality issues
Options:
--max-cell-length INT- Max lines per cell (default: 100)
Examples:
# Standard linting
nbutils lint notebook.ipynb
# Custom cell length limit
nbutils lint notebook.ipynb --max-cell-length 150
nbutils format
Auto-format code cells with black.
nbutils format notebook.ipynb [OPTIONS]
Options:
--output-dir, -o- Output directory--line-length INT- Max line length (default: 88)
Examples:
# Format in place
nbutils format notebook.ipynb
# Custom line length
nbutils format notebook.ipynb --line-length 100
nbutils git-setup
Configure git for optimal notebook workflows.
nbutils git-setup
Configures:
.gitattributesfor notebook handling.gitignorefor Python projects- Custom diff driver using nbutils
- Custom merge driver using nbutils
Run once per repository to enable git integration.
nbutils diff
Compare notebooks intelligently (ignores outputs and metadata).
nbutils diff notebook1.ipynb notebook2.ipynb [OPTIONS]
Options:
--format, -f- Output format:table,unified,json(default: table)--code-only- Show only code cell changes--stats- Show only statistics
Features:
- Ignores outputs and metadata
- Focuses on actual code changes
- Multiple output formats
Examples:
# Table view (default)
nbutils diff old.ipynb new.ipynb
# Unified diff format
nbutils diff old.ipynb new.ipynb --format unified
# Show only code changes
nbutils diff old.ipynb new.ipynb --code-only
# JSON output for automation
nbutils diff old.ipynb new.ipynb --format json
nbutils combine
Concatenate or combine two notebooks.
nbutils combine notebook1.ipynb notebook2.ipynb -o output.ipynb [OPTIONS]
Strategies:
append- Concatenate all cells from both (default)first- Keep only first notebooksecond- Keep only second notebook
Options:
--output, -o- Output file (required)--strategy- Combine strategy--report- Show detailed report
Examples:
# Concatenate notebooks
nbutils combine analysis1.ipynb analysis2.ipynb -o full.ipynb
# Keep only first notebook (copy)
nbutils combine nb1.ipynb nb2.ipynb -o output.ipynb --strategy first
Note: For true merging with conflict detection, use nbutils resolve.
nbutils resolve
Intelligent 3-way merge with conflict detection (powered by nbdime).
nbutils resolve base.ipynb ours.ipynb theirs.ipynb -o merged.ipynb [OPTIONS]
Arguments:
BASE- Common ancestor (before changes)OURS- Your version (local changes)THEIRS- Other version (remote changes)
Options:
--output, -o- Output file (required unless --check-conflicts)--strategy- Merge strategy:auto,ours,theirs,cell-append--check-conflicts- Check for conflicts only (no output file needed)--report- Show detailed merge report
Features:
- Production-grade merging with nbdime
- Automatic conflict detection
- Conflict markers for manual resolution
- Multiple merge strategies
Examples:
# Check for conflicts first
nbutils resolve base.ipynb ours.ipynb theirs.ipynb --check-conflicts
# Perform merge
nbutils resolve base.ipynb ours.ipynb theirs.ipynb -o merged.ipynb
# Use with Git
git show :1:notebook.ipynb > base.ipynb
git show :2:notebook.ipynb > ours.ipynb
git show :3:notebook.ipynb > theirs.ipynb
nbutils resolve base.ipynb ours.ipynb theirs.ipynb -o notebook.ipynb
nbutils security
Scan notebooks for security vulnerabilities.
nbutils security notebook.ipynb [OPTIONS]
Detects:
- HIGH: Hardcoded secrets (API keys, passwords, tokens)
- HIGH: Unsafe pickle deserialization
- HIGH: SQL injection risks
- MEDIUM: Command injection (os.system, eval, exec)
- MEDIUM: Unsafe YAML parsing
- MEDIUM: Disabled SSL verification
- LOW: Weak cryptographic algorithms (MD5, SHA1)
Options:
--severity- Filter by severity:low,medium,high,all(default: all)--json- Output as JSON--verbose, -v- Show detailed recommendations
Examples:
# Scan for all issues
nbutils security notebook.ipynb
# Only high severity
nbutils security notebook.ipynb --severity high
# With recommendations
nbutils security notebook.ipynb --verbose
# JSON output for CI/CD
nbutils security notebook.ipynb --json
Common Workflows
Setting up a new repository
# 1. Configure git for notebooks
nbutils git-setup
# 2. Clean notebooks before committing
nbutils clean *.ipynb
# 3. Check code quality
nbutils lint notebook.ipynb
nbutils format notebook.ipynb
# 4. Scan for security issues
nbutils security notebook.ipynb
Reviewing notebook changes
# Compare versions
nbutils diff old.ipynb new.ipynb --format unified
# Check what changed (code only)
nbutils diff old.ipynb new.ipynb --code-only
Resolving merge conflicts
# Check if there are conflicts
nbutils resolve base.ipynb ours.ipynb theirs.ipynb --check-conflicts
# Perform merge
nbutils resolve base.ipynb ours.ipynb theirs.ipynb -o merged.ipynb --report
# If conflicts exist, manually resolve in the merged file
Pre-commit checks
# Quality checks
nbutils lint notebook.ipynb
nbutils format notebook.ipynb
nbutils security notebook.ipynb --severity high
# Clean for commit
nbutils clean notebook.ipynb
ML Workflow - From Notebook to Production
# 1. Develop ML model in notebook
# (work on ml_model.ipynb)
# 2. Extract outputs for reports
nbutils extract ml_model.ipynb --images
# โ Gets all plots and visualizations
# 3. Split into production pipeline
nbutils ml-split ml_model.ipynb --output ml_pipeline/
# 4. Test the pipeline
cd ml_pipeline
python main.py
# 5. Deploy the pipeline modules
# Each module is a standalone Python file ready for production!
Development
Setup
# Clone repository
git clone https://github.com/yourusername/nbutils.git
cd nbutils
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
Run Tests
# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_security.py -v
# With coverage
pytest tests/ --cov=nbutils --cov-report=html
Code Quality
# Format code
black nbutils/ tests/
# Type checking
mypy nbutils/
Why nbutils?
Jupyter notebooks are powerful but have challenges:
| Problem | nbutils Solution |
|---|---|
| Massive git diffs | clean - Remove outputs |
| Merge conflicts | resolve - Intelligent 3-way merge |
| Hard to compare | diff - Smart comparison |
| Code quality issues | lint + format |
| Security risks | security - Vulnerability scanning |
| Manual workflows | Comprehensive CLI automation |
One tool. All solutions. Production-ready.
Roadmap
- Basic clean command
- Info command (statistics, metrics, imports)
- Export command (HTML, PDF, Markdown, etc.)
- Extract command (extract outputs, images, data)
- ML-Split command (ML notebook โ Python pipeline)
- Lint command (code quality)
- Format command (black auto-format)
- Git setup (integration)
- Diff command (intelligent comparison)
- Combine command (2-way merge)
- Resolve command (3-way merge with nbdime)
- Security command (vulnerability scanning)
- Test runner (execute and validate)
- Split command (general notebook splitting)
- Template system
- Cloud integration
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
MIT License - see LICENSE file for details.
Author
Built with for the Jupyter community by Venkatachalam Subramanian Periya Subbu
Status
Version: 0.1.2 Status: Production-ready with comprehensive test coverage New: Extract outputs & ML pipeline splitting
** Star this repo if you find it useful!**
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nbctl-0.1.3.tar.gz.
File metadata
- Download URL: nbctl-0.1.3.tar.gz
- Upload date:
- Size: 49.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0499ac2106509b565529fae19a3d5387de95fd11b4e327478611bd749bd296c7
|
|
| MD5 |
519da4dcb012180b1d8a5ede4d74a676
|
|
| BLAKE2b-256 |
39557fcfb2f07b983cdf137aeba4e0407710edbfd72a1c10bcd93ac07b113182
|
File details
Details for the file nbctl-0.1.3-py3-none-any.whl.
File metadata
- Download URL: nbctl-0.1.3-py3-none-any.whl
- Upload date:
- Size: 43.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0853600a1ea1cf79817df9f71a742774ede013852cf94c5480202c06815dfc4
|
|
| MD5 |
2e600f523e20679c584acb2751eb8149
|
|
| BLAKE2b-256 |
d1d0eaa47e8ea0ec2f1595763371dee7448d21fb2b6ae9aa217d0ac12178cc63
|