Skip to main content

A package manager for Jupyter notebook templates

Project description

notebookpkg

A notebook template manager for ML students.
One command installs a ready-to-run Jupyter notebook — already wired to your dataset, with your column names, your target, and your drop columns injected automatically.

No more writing the same boilerplate for every assignment. Just pick a template, point it to your CSV, and open Jupyter.


Installation

pip install notebookpkg

Requirements: Python 3.7+, pandas, scikit-learn, matplotlib, seaborn, nbformat, click


How It Works

  1. You run one command with your CSV file
  2. The tool reads your dataset and detects all column names and types
  3. It injects your dataset path, column names, target column, and drop columns into the template
  4. A .ipynb file is created in your current folder
  5. Open it in Jupyter and run all cells — everything is pre-filled

Quick Start

# Step 1: See all available templates
notebookpkg list

# Step 2: Install a template for your CSV
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary

# Step 3: Open the notebook
jupyter notebook linear-regression_notebook.ipynb

Commands

notebookpkg list

Lists all available templates with their descriptions.

notebookpkg list

Output:

📦 Available Templates:

  decision-tree                       Decision Tree: criterion=entropy, max_depth=5, plot_tree, accuracy, report
  eda-basic                           Basic EDA: head, shape, info, describe, nulls, dtypes, nunique
  eda-full                            Full EDA: visual + outliers, skewness, duplicates, value counts
  eda-visual                          Visual EDA: pairplot, heatmap, distributions
  kmeans-clustering                   KMeans Clustering: StandardScaler, elbow method, silhouette score, cluster plot
  knn-classifier                      KNN Classifier: StandardScaler, fit, accuracy, confusion matrix, report
  lasso-ridge                         Linear + Lasso + Ridge Regression with StandardScaler and coefficient plots
  linear-regression                   Linear Regression: EDA, fit, predict, visualize, MSE, R²
  logistic-regression                 Logistic Regression: StandardScaler, fit, accuracy, confusion matrix, report
  multi-model-compare                 LR + KNN + Naive Bayes on same dataset with accuracy comparison
  naive-bayes                         Gaussian Naive Bayes: StandardScaler, fit, accuracy, confusion matrix heatmap
  polynomial-regression               Polynomial Regression: PolynomialFeatures, smooth curve plot, MSE, R²
  random-forest-classifier            Random Forest Classifier: model1, accuracy, confusion matrix, feature importance
  random-forest-regressor             Random Forest Regressor: RFR, fit, MSE, R², Actual vs Predicted scatter
  svm-classifier                      SVM: Linear kernel, then RBF kernel with AgeSalary feature engineering

notebookpkg install

Installs a template wired to your dataset.

notebookpkg install <template-name> --dataset <path-to-csv> [options]

All options:

Option Required Default Description
--dataset Yes — Path to your CSV file
--target No Last column Target/label column name
--drop No None Columns to drop, comma-separated
--degree No 2 Polynomial degree — only for polynomial-regression
--clusters No 3 Number of clusters — only for kmeans-clustering
--output No <template>_notebook.ipynb Custom output filename


notebookpkg syntax

Prints the complete code of a template — every cell in order — directly in your terminal. Use this to preview exactly what will be generated before installing.

notebookpkg syntax <template-name>

Example:

notebookpkg syntax logistic-regression

Output:

============================================================
  Template : logistic-regression
  Logistic Regression: StandardScaler, fit, accuracy, confusion matrix, report
  Total cells: 16
============================================================

── Cell 1 ──────────────────────────────────────────────────
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

── Cell 2 ──────────────────────────────────────────────────
df = pd.read_csv('{{DATASET_PATH}}')
df.head()

── Cell 3 ──────────────────────────────────────────────────
{{DROP_CODE}}

... (all remaining cells shown in full)

============================================================
  Install this template:
  notebookpkg install logistic-regression --dataset yourdata.csv
============================================================

You can run syntax for any of the 15 templates:

notebookpkg syntax eda-basic
notebookpkg syntax eda-visual
notebookpkg syntax eda-full
notebookpkg syntax linear-regression
notebookpkg syntax polynomial-regression
notebookpkg syntax logistic-regression
notebookpkg syntax knn-classifier
notebookpkg syntax naive-bayes
notebookpkg syntax lasso-ridge
notebookpkg syntax decision-tree
notebookpkg syntax random-forest-regressor
notebookpkg syntax random-forest-classifier
notebookpkg syntax svm-classifier
notebookpkg syntax kmeans-clustering
notebookpkg syntax multi-model-compare

Templates

EDA Templates

eda-basic

Basic Exploratory Data Analysis. Covers the essential checks every notebook needs.

Cells generated:

  1. Imports
  2. df.read_csv() + df.head()
  3. Drop columns cell (optional)
  4. df.shape
  5. df.info()
  6. df.describe()
  7. df.isnull().sum()
  8. df.dtypes
  9. df.nunique()
notebookpkg install eda-basic --dataset data.csv

eda-visual

EDA with all key visualizations.

Cells generated: Everything in eda-basic, plus:

  • sns.pairplot(df)
  • Correlation heatmap (df.corr() + sns.heatmap())
  • Histogram for each numeric column
notebookpkg install eda-visual --dataset data.csv

eda-full

Complete EDA including outlier detection and categorical analysis.

Cells generated: Everything in eda-visual, plus:

  • df.duplicated().sum()
  • Boxplot for each numeric column
  • Skewness: df.skew(numeric_only=True)
  • IQR outlier count for each numeric column
  • value_counts() for each categorical column
notebookpkg install eda-full --dataset data.csv

Regression Templates

linear-regression

Standard Linear Regression pipeline on your CSV.

Cells generated:

  1. Imports
  2. Load dataset + head
  3. Drop columns cell
  4. shape, info, describe, isnull
  5. pairplot
  6. Correlation heatmap
  7. X / y split (iloc)
  8. train_test_split (test_size=0.2, random_state=0)
  9. regressor = LinearRegression() + fit
  10. Predict
  11. Visualize training data (scatter + regression line)
  12. Visualize testing data
  13. Coefficient and intercept
  14. MSE
  15. R²
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary

polynomial-regression

Polynomial Regression with smooth curve visualization.

Cells generated:

  1. Imports (includes PolynomialFeatures)
  2. Load dataset + head
  3. Drop columns cell
  4. info, describe, pairplot, heatmap
  5. X / y split
  6. PolynomialFeatures(degree=N) + transform
  7. train_test_split
  8. plr = LinearRegression() + fit
  9. Smooth curve plot using X_gride
  10. Predict
  11. MSE
  12. R²
notebookpkg install polynomial-regression --dataset hw.csv --target Price
notebookpkg install polynomial-regression --dataset hw.csv --target Price --degree 3

lasso-ridge

Linear Regression + Lasso + Ridge, all on the same dataset with comparison.

Cells generated:

  1. Imports
  2. Load + EDA (info, describe, columns, shape)
  3. Drop columns cell
  4. X / y split
  5. train_test_split
  6. StandardScaler
  7. Linear Regression (lm) + coefficient barh plot
  8. Lasso (alpha=0.1) + MSE + R² + coefficient barh plot
  9. Ridge (alpha=0.1) + MSE + R²
notebookpkg install lasso-ridge --dataset BostonHousing.csv --target medv

Classification Templates

logistic-regression

Logistic Regression with StandardScaler.

Cells generated:

  1. Imports
  2. Load dataset + head
  3. Drop columns cell
  4. shape, info, describe, isnull
  5. Correlation heatmap
  6. X / y split
  7. train_test_split (test_size=0.3, random_state=0)
  8. sc = StandardScaler() + fit_transform / transform
  9. lr = LogisticRegression() + fit
  10. Predict
  11. Accuracy score
  12. Confusion matrix
  13. Classification report
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"

knn-classifier

K-Nearest Neighbors Classifier with StandardScaler.

Cells generated:

  1. Imports
  2. Load dataset + head
  3. Drop columns cell
  4. shape, info, describe, isnull, duplicated
  5. Correlation heatmap + pairplot
  6. X / y split
  7. train_test_split (test_size=0.2, random_state=42)
  8. StandardScaler
  9. knn = KNeighborsClassifier() + fit
  10. Predict
  11. Accuracy, confusion matrix, classification report
notebookpkg install knn-classifier --dataset Day5.csv --target Purchased

naive-bayes

Gaussian Naive Bayes with StandardScaler and confusion matrix heatmap.

Cells generated:

  1. Imports
  2. Load + shape, describe, isnull
  3. Drop columns cell
  4. Correlation heatmap
  5. X / y split
  6. train_test_split with stratify=y
  7. StandardScaler (fit only on train)
  8. nb = GaussianNB() + fit
  9. Predict
  10. Accuracy
  11. Classification report
  12. Confusion matrix as sns.heatmap
notebookpkg install naive-bayes --dataset Day5.csv --target Purchased

decision-tree

Decision Tree Classifier with tree visualization.

Cells generated:

  1. Imports (includes from sklearn import tree)
  2. Load + EDA
  3. Drop columns cell
  4. Distribution plot + heatmap + pairplot
  5. X / y split
  6. train_test_split
  7. StandardScaler
  8. DecisionTreeClassifier(criterion='entropy', max_depth=5, random_state=0)
  9. Predict
  10. Accuracy score
  11. Confusion matrix
  12. Classification report
  13. tree.plot_tree() — full visual tree diagram
notebookpkg install decision-tree --dataset SNP.csv --target Purchased

svm-classifier

SVM with both Linear and RBF kernels, plus feature engineering.

Cells generated:

  1. Imports (includes SVC)
  2. Load + EDA (info, describe, isnull, value_counts)
  3. Drop columns cell
  4. Scatter plot of features
  5. X / y split
  6. train_test_split
  7. StandardScaler
  8. model = SVC(kernel='linear') + fit + predict + accuracy + CM + heatmap
  9. Feature engineering: df['AgeSalary'] = df['Age'] * df['EstimatedSalary']
  10. Re-split with new feature
  11. model1 = SVC(kernel='rbf') + fit + predict + accuracy + CM + heatmap
notebookpkg install svm-classifier --dataset SNP.csv --target Purchased

multi-model-compare

Runs Logistic Regression, KNN, and Naive Bayes on the same dataset and compares accuracy.

Cells generated:

  1. Imports
  2. Load + EDA
  3. Drop columns cell
  4. X / y split
  5. train_test_split
  6. model_lr = LogisticRegression() → fit → predict → accuracy → report
  7. model_knn = KNeighborsClassifier() → fit → predict → accuracy → report
  8. model_nb = GaussianNB() → fit → predict → accuracy → report
  9. Comparison dict with all three accuracy scores printed together
notebookpkg install multi-model-compare --dataset Day5.csv --target Purchased

Ensemble Templates

random-forest-regressor

Random Forest Regressor with actual vs predicted scatter plot.

Cells generated:

  1. Imports
  2. Load + isnull, duplicated, info, describe
  3. Drop columns cell
  4. Correlation heatmap
  5. X / y split
  6. train_test_split (test_size=0.2, random_state=42)
  7. RFR = RandomForestRegressor(n_estimators=100, random_state=42) + fit
  8. Predict
  9. MSE
  10. R²
  11. Scatter plot: Actual vs Predicted
notebookpkg install random-forest-regressor --dataset housing.csv --target Price

random-forest-classifier

Random Forest Classifier with feature importance bar chart.

Cells generated:

  1. Imports
  2. Load + EDA
  3. Drop columns cell
  4. X / y split
  5. train_test_split
  6. StandardScaler
  7. model1 = RandomForestClassifier(n_estimators=100, random_state=42) + fit
  8. Predict
  9. Accuracy
  10. Classification report
  11. Confusion matrix heatmap
  12. Feature importance: model1.feature_importances_
  13. Bar chart of feature importance
notebookpkg install random-forest-classifier --dataset iris.csv --target species

Clustering Templates

kmeans-clustering

KMeans Clustering with elbow method and silhouette score. No target column needed.

Cells generated:

  1. Imports (includes KMeans, silhouette_score)
  2. Load + shape, info, describe, isnull, duplicated
  3. Drop columns cell
  4. pairplot
  5. Correlation heatmap
  6. StandardScaler on numeric columns
  7. Elbow method loop (k=1 to 9) + inertia plot
  8. KMeans(n_clusters=N) + fit
  9. Cluster labels added to df
  10. Cluster scatter plot with centroids marked in red
  11. Silhouette score
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv --clusters 5

The --drop Option

Many real datasets have ID columns, name columns, or other columns that should not go into the model. Use --drop to remove them before anything is processed.

With --drop, the generated notebook gets:

df = df.drop(columns=['User ID', 'Gender'], axis=1)
df.head()

Without --drop, the cell appears as a comment so you can still do it manually:

# No columns dropped
# To drop columns use: df = df.drop(columns=['col1','col2'], axis=1)

The profiler also respects the drop — column detection for NUMERIC_COLS, CAT_COLS, and FEATURE_COLS all happen after the drop, so the rest of the notebook is consistent.

# Drop one column
notebookpkg install knn-classifier --dataset Day5.csv --target Purchased --drop "User ID"

# Drop multiple columns
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"

All Usage Examples

# ── EDA ──────────────────────────────────────────────────────────────────
notebookpkg install eda-basic   --dataset data.csv
notebookpkg install eda-visual  --dataset data.csv
notebookpkg install eda-full    --dataset data.csv

# ── Regression ───────────────────────────────────────────────────────────
notebookpkg install linear-regression      --dataset Salary_Data.csv --target Salary
notebookpkg install polynomial-regression  --dataset hw.csv --target Price --degree 3
notebookpkg install lasso-ridge            --dataset BostonHousing.csv --target medv

# ── Classification ────────────────────────────────────────────────────────
notebookpkg install logistic-regression    --dataset Day5.csv --target Purchased
notebookpkg install knn-classifier         --dataset Day5.csv --target Purchased
notebookpkg install naive-bayes            --dataset Day5.csv --target Purchased
notebookpkg install decision-tree          --dataset SNP.csv  --target Purchased
notebookpkg install svm-classifier         --dataset SNP.csv  --target Purchased
notebookpkg install multi-model-compare    --dataset Day5.csv --target Purchased

# ── Ensemble ─────────────────────────────────────────────────────────────
notebookpkg install random-forest-regressor   --dataset housing.csv --target Price
notebookpkg install random-forest-classifier  --dataset iris.csv    --target species

# ── Clustering ────────────────────────────────────────────────────────────
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv --clusters 5

# ── With drop ─────────────────────────────────────────────────────────────
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"

# ── Custom output filename ────────────────────────────────────────────────
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary --output my_analysis.ipynb

Project Structure

notebookpkg/
├── notebookpkg/
│   ├── cli.py          # CLI commands: install, list
│   ├── profiler.py     # Reads CSV, detects column types
│   ├── injector.py     # Replaces tokens in notebook cells
│   ├── registry.py     # Finds templates by name
│   └── templates/
│       ├── eda-basic/
│       ├── eda-visual/
│       ├── eda-full/
│       ├── linear-regression/
│       ├── polynomial-regression/
│       ├── logistic-regression/
│       ├── knn-classifier/
│       ├── naive-bayes/
│       ├── lasso-ridge/
│       ├── decision-tree/
│       ├── random-forest-regressor/
│       ├── random-forest-classifier/
│       ├── svm-classifier/
│       ├── kmeans-clustering/
│       └── multi-model-compare/
├── build_templates.py  # Regenerates all .ipynb template files
├── setup.py
├── MANIFEST.in
└── README.md

Each template folder contains:

  • template.ipynb — the notebook with {{TOKEN}} placeholders
  • meta.json — name, description, and whether a target column is needed

Dependencies

pandas
numpy
scikit-learn
matplotlib
seaborn
nbformat
click

These are installed automatically when you run pip install notebookpkg.


Author

Priyansu Pattanaik
B.Tech — Electronics & Telecommunication
PG Diploma in AI — CDAC Kharghar
priyansupattanaikwork@gmail.com


License

MIT License. Free to use, modify, and distribute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notebookpkg-1.4.0.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

notebookpkg-1.4.0-py3-none-any.whl (35.0 kB view details)

Uploaded Python 3

File details

Details for the file notebookpkg-1.4.0.tar.gz.

File metadata

  • Download URL: notebookpkg-1.4.0.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for notebookpkg-1.4.0.tar.gz
Algorithm Hash digest
SHA256 63b5bccb050f8331e9d7381b1bd74914ba2f587acf45bb4a74bd4fb5d1e98e26
MD5 612289630d4804f5d4963b48b59f6581
BLAKE2b-256 3232a2ce339ec1ae33cff28cb72f1b854d36e9d0e084a91092870ba5293cda6e

See more details on using hashes here.

File details

Details for the file notebookpkg-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: notebookpkg-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 35.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for notebookpkg-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5b6b8c43d2ef5251e7fd7c4ca8ce9f12800dfa0f714467530b4641c85d00106a
MD5 03b8696041192e936980b784794177d4
BLAKE2b-256 adfb4afec3e374a770b1169e262a028fd45365267deff06434e784d70b067f24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page