Skip to main content

A package manager for Jupyter notebook templates

Project description

notebookpkg

A notebook template manager for ML students.
One command installs a ready-to-run Jupyter notebook — already wired to your dataset, with your column names, your target, and your drop columns injected automatically.

No more writing the same boilerplate for every assignment. Just pick a template, point it to your CSV, and open Jupyter.


Installation

pip install notebookpkg

Requirements: Python 3.7+, pandas, scikit-learn, matplotlib, seaborn, nbformat, click


How It Works

  1. You run one command with your CSV file
  2. The tool reads your dataset and detects all column names and types
  3. It injects your dataset path, column names, target column, and drop columns into the template
  4. A .ipynb file is created in your current folder
  5. Open it in Jupyter and run all cells — everything is pre-filled

Quick Start

# Step 1: See all available templates
notebookpkg list

# Step 2: Install a template for your CSV
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary

# Step 3: Open the notebook
jupyter notebook linear-regression_notebook.ipynb

Commands

notebookpkg list

Lists all available templates with their descriptions.

notebookpkg list

Output:

📦 Available Templates:

  decision-tree                       Decision Tree: criterion=entropy, max_depth=5, plot_tree, accuracy, report
  eda-basic                           Basic EDA: head, shape, info, describe, nulls, dtypes, nunique
  eda-full                            Full EDA: visual + outliers, skewness, duplicates, value counts
  eda-visual                          Visual EDA: pairplot, heatmap, distributions
  kmeans-clustering                   KMeans Clustering: StandardScaler, elbow method, silhouette score, cluster plot
  knn-classifier                      KNN Classifier: StandardScaler, fit, accuracy, confusion matrix, report
  lasso-ridge                         Linear + Lasso + Ridge Regression with StandardScaler and coefficient plots
  linear-regression                   Linear Regression: EDA, fit, predict, visualize, MSE, R²
  logistic-regression                 Logistic Regression: StandardScaler, fit, accuracy, confusion matrix, report
  multi-model-compare                 LR + KNN + Naive Bayes on same dataset with accuracy comparison
  naive-bayes                         Gaussian Naive Bayes: StandardScaler, fit, accuracy, confusion matrix heatmap
  polynomial-regression               Polynomial Regression: PolynomialFeatures, smooth curve plot, MSE, R²
  random-forest-classifier            Random Forest Classifier: model1, accuracy, confusion matrix, feature importance
  random-forest-regressor             Random Forest Regressor: RFR, fit, MSE, R², Actual vs Predicted scatter
  svm-classifier                      SVM: Linear kernel, then RBF kernel with AgeSalary feature engineering

notebookpkg install

Installs a template wired to your dataset.

notebookpkg install <template-name> --dataset <path-to-csv> [options]

All options:

Option Required Default Description
--dataset Yes — Path to your CSV file
--target No Last column Target/label column name
--drop No None Columns to drop, comma-separated
--degree No 2 Polynomial degree — only for polynomial-regression
--clusters No 3 Number of clusters — only for kmeans-clustering
--output No <template>_notebook.ipynb Custom output filename


notebookpkg syntax

Prints the complete code of a template — every cell in order — directly in your terminal. Use this to preview exactly what will be generated before installing.

notebookpkg syntax <template-name>

Example:

notebookpkg syntax logistic-regression

Output:

============================================================
  Template : logistic-regression
  Logistic Regression: StandardScaler, fit, accuracy, confusion matrix, report
  Total cells: 16
============================================================

── Cell 1 ──────────────────────────────────────────────────
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

── Cell 2 ──────────────────────────────────────────────────
df = pd.read_csv('{{DATASET_PATH}}')
df.head()

── Cell 3 ──────────────────────────────────────────────────
{{DROP_CODE}}

... (all remaining cells shown in full)

============================================================
  Install this template:
  notebookpkg install logistic-regression --dataset yourdata.csv
============================================================

You can run syntax for any of the 15 templates:

notebookpkg syntax eda-basic
notebookpkg syntax eda-visual
notebookpkg syntax eda-full
notebookpkg syntax linear-regression
notebookpkg syntax polynomial-regression
notebookpkg syntax logistic-regression
notebookpkg syntax knn-classifier
notebookpkg syntax naive-bayes
notebookpkg syntax lasso-ridge
notebookpkg syntax decision-tree
notebookpkg syntax random-forest-regressor
notebookpkg syntax random-forest-classifier
notebookpkg syntax svm-classifier
notebookpkg syntax kmeans-clustering
notebookpkg syntax multi-model-compare

Templates

EDA Templates

eda-basic

Basic Exploratory Data Analysis. Covers the essential checks every notebook needs.

Cells generated:

  1. Imports
  2. df.read_csv() + df.head()
  3. Drop columns cell (optional)
  4. df.shape
  5. df.info()
  6. df.describe()
  7. df.isnull().sum()
  8. df.dtypes
  9. df.nunique()
notebookpkg install eda-basic --dataset data.csv

eda-visual

EDA with all key visualizations.

Cells generated: Everything in eda-basic, plus:

  • sns.pairplot(df)
  • Correlation heatmap (df.corr() + sns.heatmap())
  • Histogram for each numeric column
notebookpkg install eda-visual --dataset data.csv

eda-full

Complete EDA including outlier detection and categorical analysis.

Cells generated: Everything in eda-visual, plus:

  • df.duplicated().sum()
  • Boxplot for each numeric column
  • Skewness: df.skew(numeric_only=True)
  • IQR outlier count for each numeric column
  • value_counts() for each categorical column
notebookpkg install eda-full --dataset data.csv

Regression Templates

linear-regression

Standard Linear Regression pipeline on your CSV.

Cells generated:

  1. Imports
  2. Load dataset + head
  3. Drop columns cell
  4. shape, info, describe, isnull
  5. pairplot
  6. Correlation heatmap
  7. X / y split (iloc)
  8. train_test_split (test_size=0.2, random_state=0)
  9. regressor = LinearRegression() + fit
  10. Predict
  11. Visualize training data (scatter + regression line)
  12. Visualize testing data
  13. Coefficient and intercept
  14. MSE
  15. R²
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary

polynomial-regression

Polynomial Regression with smooth curve visualization.

Cells generated:

  1. Imports (includes PolynomialFeatures)
  2. Load dataset + head
  3. Drop columns cell
  4. info, describe, pairplot, heatmap
  5. X / y split
  6. PolynomialFeatures(degree=N) + transform
  7. train_test_split
  8. plr = LinearRegression() + fit
  9. Smooth curve plot using X_gride
  10. Predict
  11. MSE
  12. R²
notebookpkg install polynomial-regression --dataset hw.csv --target Price
notebookpkg install polynomial-regression --dataset hw.csv --target Price --degree 3

lasso-ridge

Linear Regression + Lasso + Ridge, all on the same dataset with comparison.

Cells generated:

  1. Imports
  2. Load + EDA (info, describe, columns, shape)
  3. Drop columns cell
  4. X / y split
  5. train_test_split
  6. StandardScaler
  7. Linear Regression (lm) + coefficient barh plot
  8. Lasso (alpha=0.1) + MSE + R² + coefficient barh plot
  9. Ridge (alpha=0.1) + MSE + R²
notebookpkg install lasso-ridge --dataset BostonHousing.csv --target medv

Classification Templates

logistic-regression

Logistic Regression with StandardScaler.

Cells generated:

  1. Imports
  2. Load dataset + head
  3. Drop columns cell
  4. shape, info, describe, isnull
  5. Correlation heatmap
  6. X / y split
  7. train_test_split (test_size=0.3, random_state=0)
  8. sc = StandardScaler() + fit_transform / transform
  9. lr = LogisticRegression() + fit
  10. Predict
  11. Accuracy score
  12. Confusion matrix
  13. Classification report
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"

knn-classifier

K-Nearest Neighbors Classifier with StandardScaler.

Cells generated:

  1. Imports
  2. Load dataset + head
  3. Drop columns cell
  4. shape, info, describe, isnull, duplicated
  5. Correlation heatmap + pairplot
  6. X / y split
  7. train_test_split (test_size=0.2, random_state=42)
  8. StandardScaler
  9. knn = KNeighborsClassifier() + fit
  10. Predict
  11. Accuracy, confusion matrix, classification report
notebookpkg install knn-classifier --dataset Day5.csv --target Purchased

naive-bayes

Gaussian Naive Bayes with StandardScaler and confusion matrix heatmap.

Cells generated:

  1. Imports
  2. Load + shape, describe, isnull
  3. Drop columns cell
  4. Correlation heatmap
  5. X / y split
  6. train_test_split with stratify=y
  7. StandardScaler (fit only on train)
  8. nb = GaussianNB() + fit
  9. Predict
  10. Accuracy
  11. Classification report
  12. Confusion matrix as sns.heatmap
notebookpkg install naive-bayes --dataset Day5.csv --target Purchased

decision-tree

Decision Tree Classifier with tree visualization.

Cells generated:

  1. Imports (includes from sklearn import tree)
  2. Load + EDA
  3. Drop columns cell
  4. Distribution plot + heatmap + pairplot
  5. X / y split
  6. train_test_split
  7. StandardScaler
  8. DecisionTreeClassifier(criterion='entropy', max_depth=5, random_state=0)
  9. Predict
  10. Accuracy score
  11. Confusion matrix
  12. Classification report
  13. tree.plot_tree() — full visual tree diagram
notebookpkg install decision-tree --dataset SNP.csv --target Purchased

svm-classifier

SVM with both Linear and RBF kernels, plus feature engineering.

Cells generated:

  1. Imports (includes SVC)
  2. Load + EDA (info, describe, isnull, value_counts)
  3. Drop columns cell
  4. Scatter plot of features
  5. X / y split
  6. train_test_split
  7. StandardScaler
  8. model = SVC(kernel='linear') + fit + predict + accuracy + CM + heatmap
  9. Feature engineering: df['AgeSalary'] = df['Age'] * df['EstimatedSalary']
  10. Re-split with new feature
  11. model1 = SVC(kernel='rbf') + fit + predict + accuracy + CM + heatmap
notebookpkg install svm-classifier --dataset SNP.csv --target Purchased

multi-model-compare

Runs Logistic Regression, KNN, and Naive Bayes on the same dataset and compares accuracy.

Cells generated:

  1. Imports
  2. Load + EDA
  3. Drop columns cell
  4. X / y split
  5. train_test_split
  6. model_lr = LogisticRegression() → fit → predict → accuracy → report
  7. model_knn = KNeighborsClassifier() → fit → predict → accuracy → report
  8. model_nb = GaussianNB() → fit → predict → accuracy → report
  9. Comparison dict with all three accuracy scores printed together
notebookpkg install multi-model-compare --dataset Day5.csv --target Purchased

Ensemble Templates

random-forest-regressor

Random Forest Regressor with actual vs predicted scatter plot.

Cells generated:

  1. Imports
  2. Load + isnull, duplicated, info, describe
  3. Drop columns cell
  4. Correlation heatmap
  5. X / y split
  6. train_test_split (test_size=0.2, random_state=42)
  7. RFR = RandomForestRegressor(n_estimators=100, random_state=42) + fit
  8. Predict
  9. MSE
  10. R²
  11. Scatter plot: Actual vs Predicted
notebookpkg install random-forest-regressor --dataset housing.csv --target Price

random-forest-classifier

Random Forest Classifier with feature importance bar chart.

Cells generated:

  1. Imports
  2. Load + EDA
  3. Drop columns cell
  4. X / y split
  5. train_test_split
  6. StandardScaler
  7. model1 = RandomForestClassifier(n_estimators=100, random_state=42) + fit
  8. Predict
  9. Accuracy
  10. Classification report
  11. Confusion matrix heatmap
  12. Feature importance: model1.feature_importances_
  13. Bar chart of feature importance
notebookpkg install random-forest-classifier --dataset iris.csv --target species

Clustering Templates

kmeans-clustering

KMeans Clustering with elbow method and silhouette score. No target column needed.

Cells generated:

  1. Imports (includes KMeans, silhouette_score)
  2. Load + shape, info, describe, isnull, duplicated
  3. Drop columns cell
  4. pairplot
  5. Correlation heatmap
  6. StandardScaler on numeric columns
  7. Elbow method loop (k=1 to 9) + inertia plot
  8. KMeans(n_clusters=N) + fit
  9. Cluster labels added to df
  10. Cluster scatter plot with centroids marked in red
  11. Silhouette score
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv --clusters 5

The --drop Option

Many real datasets have ID columns, name columns, or other columns that should not go into the model. Use --drop to remove them before anything is processed.

With --drop, the generated notebook gets:

df = df.drop(columns=['User ID', 'Gender'], axis=1)
df.head()

Without --drop, the cell appears as a comment so you can still do it manually:

# No columns dropped
# To drop columns use: df = df.drop(columns=['col1','col2'], axis=1)

The profiler also respects the drop — column detection for NUMERIC_COLS, CAT_COLS, and FEATURE_COLS all happen after the drop, so the rest of the notebook is consistent.

# Drop one column
notebookpkg install knn-classifier --dataset Day5.csv --target Purchased --drop "User ID"

# Drop multiple columns
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"

All Usage Examples

# ── EDA ──────────────────────────────────────────────────────────────────
notebookpkg install eda-basic   --dataset data.csv
notebookpkg install eda-visual  --dataset data.csv
notebookpkg install eda-full    --dataset data.csv

# ── Regression ───────────────────────────────────────────────────────────
notebookpkg install linear-regression      --dataset Salary_Data.csv --target Salary
notebookpkg install polynomial-regression  --dataset hw.csv --target Price --degree 3
notebookpkg install lasso-ridge            --dataset BostonHousing.csv --target medv

# ── Classification ────────────────────────────────────────────────────────
notebookpkg install logistic-regression    --dataset Day5.csv --target Purchased
notebookpkg install knn-classifier         --dataset Day5.csv --target Purchased
notebookpkg install naive-bayes            --dataset Day5.csv --target Purchased
notebookpkg install decision-tree          --dataset SNP.csv  --target Purchased
notebookpkg install svm-classifier         --dataset SNP.csv  --target Purchased
notebookpkg install multi-model-compare    --dataset Day5.csv --target Purchased

# ── Ensemble ─────────────────────────────────────────────────────────────
notebookpkg install random-forest-regressor   --dataset housing.csv --target Price
notebookpkg install random-forest-classifier  --dataset iris.csv    --target species

# ── Clustering ────────────────────────────────────────────────────────────
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv --clusters 5

# ── With drop ─────────────────────────────────────────────────────────────
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"

# ── Custom output filename ────────────────────────────────────────────────
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary --output my_analysis.ipynb

Project Structure

notebookpkg/
├── notebookpkg/
│   ├── cli.py          # CLI commands: install, list
│   ├── profiler.py     # Reads CSV, detects column types
│   ├── injector.py     # Replaces tokens in notebook cells
│   ├── registry.py     # Finds templates by name
│   └── templates/
│       ├── eda-basic/
│       ├── eda-visual/
│       ├── eda-full/
│       ├── linear-regression/
│       ├── polynomial-regression/
│       ├── logistic-regression/
│       ├── knn-classifier/
│       ├── naive-bayes/
│       ├── lasso-ridge/
│       ├── decision-tree/
│       ├── random-forest-regressor/
│       ├── random-forest-classifier/
│       ├── svm-classifier/
│       ├── kmeans-clustering/
│       └── multi-model-compare/
├── build_templates.py  # Regenerates all .ipynb template files
├── setup.py
├── MANIFEST.in
└── README.md

Each template folder contains:

  • template.ipynb — the notebook with {{TOKEN}} placeholders
  • meta.json — name, description, and whether a target column is needed

Dependencies

pandas
numpy
scikit-learn
matplotlib
seaborn
nbformat
click

These are installed automatically when you run pip install notebookpkg.


Author

Priyansu Pattanaik
B.Tech — Electronics & Telecommunication
PG Diploma in AI — CDAC Kharghar
priyansupattanaikwork@gmail.com


License

MIT License. Free to use, modify, and distribute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notebookpkg-1.3.0.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

notebookpkg-1.3.0-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file notebookpkg-1.3.0.tar.gz.

File metadata

  • Download URL: notebookpkg-1.3.0.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for notebookpkg-1.3.0.tar.gz
Algorithm Hash digest
SHA256 78f3a1ea734a8c18216146fd6493aaded2d4342516a8b15cf63e10a908aceb6b
MD5 00918a615dd54cba3b58b9a13379a12f
BLAKE2b-256 c864776750b7725f9a9ccc08b7b0349aeafb77ad24eb0ab34e529bbed6b55015

See more details on using hashes here.

File details

Details for the file notebookpkg-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: notebookpkg-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for notebookpkg-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ec6fd2b2b0733792e2aece686a6e44e7e949107effa57395be70577dc1099b7
MD5 ca952a47af9ab94028537f6abcf11b23
BLAKE2b-256 bcaf41ce98e975f8dc1b7680b03708f65b43390649d81bd8cc8432d2a7629ae2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page