A package manager for Jupyter notebook templates
Project description
notebookpkg
A notebook template manager for ML students.
One command installs a ready-to-run Jupyter notebook — already wired to your dataset, with your column names, your target, and your drop columns injected automatically.
No more writing the same boilerplate for every assignment. Just pick a template, point it to your CSV, and open Jupyter.
Installation
pip install notebookpkg
Requirements: Python 3.7+, pandas, scikit-learn, matplotlib, seaborn, nbformat, click
How It Works
- You run one command with your CSV file
- The tool reads your dataset and detects all column names and types
- It injects your dataset path, column names, target column, and drop columns into the template
- A
.ipynbfile is created in your current folder - Open it in Jupyter and run all cells — everything is pre-filled
Quick Start
# Step 1: See all available templates
notebookpkg list
# Step 2: Install a template for your CSV
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary
# Step 3: Open the notebook
jupyter notebook linear-regression_notebook.ipynb
Commands
notebookpkg list
Lists all available templates with their descriptions.
notebookpkg list
Output:
📦 Available Templates:
decision-tree Decision Tree: criterion=entropy, max_depth=5, plot_tree, accuracy, report
eda-basic Basic EDA: head, shape, info, describe, nulls, dtypes, nunique
eda-full Full EDA: visual + outliers, skewness, duplicates, value counts
eda-visual Visual EDA: pairplot, heatmap, distributions
kmeans-clustering KMeans Clustering: StandardScaler, elbow method, silhouette score, cluster plot
knn-classifier KNN Classifier: StandardScaler, fit, accuracy, confusion matrix, report
lasso-ridge Linear + Lasso + Ridge Regression with StandardScaler and coefficient plots
linear-regression Linear Regression: EDA, fit, predict, visualize, MSE, R²
logistic-regression Logistic Regression: StandardScaler, fit, accuracy, confusion matrix, report
multi-model-compare LR + KNN + Naive Bayes on same dataset with accuracy comparison
naive-bayes Gaussian Naive Bayes: StandardScaler, fit, accuracy, confusion matrix heatmap
polynomial-regression Polynomial Regression: PolynomialFeatures, smooth curve plot, MSE, R²
random-forest-classifier Random Forest Classifier: model1, accuracy, confusion matrix, feature importance
random-forest-regressor Random Forest Regressor: RFR, fit, MSE, R², Actual vs Predicted scatter
svm-classifier SVM: Linear kernel, then RBF kernel with AgeSalary feature engineering
notebookpkg install
Installs a template wired to your dataset.
notebookpkg install <template-name> --dataset <path-to-csv> [options]
All options:
| Option | Required | Default | Description |
|---|---|---|---|
--dataset |
Yes | — | Path to your CSV file |
--target |
No | Last column | Target/label column name |
--drop |
No | None | Columns to drop, comma-separated |
--degree |
No | 2 |
Polynomial degree — only for polynomial-regression |
--clusters |
No | 3 |
Number of clusters — only for kmeans-clustering |
--output |
No | <template>_notebook.ipynb |
Custom output filename |
notebookpkg syntax
Prints the complete code of a template — every cell in order — directly in your terminal. Use this to preview exactly what will be generated before installing.
notebookpkg syntax <template-name>
Example:
notebookpkg syntax logistic-regression
Output:
============================================================
Template : logistic-regression
Logistic Regression: StandardScaler, fit, accuracy, confusion matrix, report
Total cells: 16
============================================================
── Cell 1 ──────────────────────────────────────────────────
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
── Cell 2 ──────────────────────────────────────────────────
df = pd.read_csv('{{DATASET_PATH}}')
df.head()
── Cell 3 ──────────────────────────────────────────────────
{{DROP_CODE}}
... (all remaining cells shown in full)
============================================================
Install this template:
notebookpkg install logistic-regression --dataset yourdata.csv
============================================================
You can run syntax for any of the 15 templates:
notebookpkg syntax eda-basic
notebookpkg syntax eda-visual
notebookpkg syntax eda-full
notebookpkg syntax linear-regression
notebookpkg syntax polynomial-regression
notebookpkg syntax logistic-regression
notebookpkg syntax knn-classifier
notebookpkg syntax naive-bayes
notebookpkg syntax lasso-ridge
notebookpkg syntax decision-tree
notebookpkg syntax random-forest-regressor
notebookpkg syntax random-forest-classifier
notebookpkg syntax svm-classifier
notebookpkg syntax kmeans-clustering
notebookpkg syntax multi-model-compare
Templates
EDA Templates
eda-basic
Basic Exploratory Data Analysis. Covers the essential checks every notebook needs.
Cells generated:
- Imports
df.read_csv()+df.head()- Drop columns cell (optional)
df.shapedf.info()df.describe()df.isnull().sum()df.dtypesdf.nunique()
notebookpkg install eda-basic --dataset data.csv
eda-visual
EDA with all key visualizations.
Cells generated:
Everything in eda-basic, plus:
sns.pairplot(df)- Correlation heatmap (
df.corr()+sns.heatmap()) - Histogram for each numeric column
notebookpkg install eda-visual --dataset data.csv
eda-full
Complete EDA including outlier detection and categorical analysis.
Cells generated:
Everything in eda-visual, plus:
df.duplicated().sum()- Boxplot for each numeric column
- Skewness:
df.skew(numeric_only=True) - IQR outlier count for each numeric column
value_counts()for each categorical column
notebookpkg install eda-full --dataset data.csv
Regression Templates
linear-regression
Standard Linear Regression pipeline on your CSV.
Cells generated:
- Imports
- Load dataset + head
- Drop columns cell
- shape, info, describe, isnull
- pairplot
- Correlation heatmap
- X / y split (iloc)
- train_test_split (test_size=0.2, random_state=0)
regressor = LinearRegression()+ fit- Predict
- Visualize training data (scatter + regression line)
- Visualize testing data
- Coefficient and intercept
- MSE
- R²
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary
polynomial-regression
Polynomial Regression with smooth curve visualization.
Cells generated:
- Imports (includes
PolynomialFeatures) - Load dataset + head
- Drop columns cell
- info, describe, pairplot, heatmap
- X / y split
PolynomialFeatures(degree=N)+ transform- train_test_split
plr = LinearRegression()+ fit- Smooth curve plot using
X_gride - Predict
- MSE
- R²
notebookpkg install polynomial-regression --dataset hw.csv --target Price
notebookpkg install polynomial-regression --dataset hw.csv --target Price --degree 3
lasso-ridge
Linear Regression + Lasso + Ridge, all on the same dataset with comparison.
Cells generated:
- Imports
- Load + EDA (info, describe, columns, shape)
- Drop columns cell
- X / y split
- train_test_split
- StandardScaler
- Linear Regression (
lm) + coefficient barh plot - Lasso (
alpha=0.1) + MSE + R² + coefficient barh plot - Ridge (
alpha=0.1) + MSE + R²
notebookpkg install lasso-ridge --dataset BostonHousing.csv --target medv
Classification Templates
logistic-regression
Logistic Regression with StandardScaler.
Cells generated:
- Imports
- Load dataset + head
- Drop columns cell
- shape, info, describe, isnull
- Correlation heatmap
- X / y split
- train_test_split (test_size=0.3, random_state=0)
sc = StandardScaler()+ fit_transform / transformlr = LogisticRegression()+ fit- Predict
- Accuracy score
- Confusion matrix
- Classification report
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"
knn-classifier
K-Nearest Neighbors Classifier with StandardScaler.
Cells generated:
- Imports
- Load dataset + head
- Drop columns cell
- shape, info, describe, isnull, duplicated
- Correlation heatmap + pairplot
- X / y split
- train_test_split (test_size=0.2, random_state=42)
- StandardScaler
knn = KNeighborsClassifier()+ fit- Predict
- Accuracy, confusion matrix, classification report
notebookpkg install knn-classifier --dataset Day5.csv --target Purchased
naive-bayes
Gaussian Naive Bayes with StandardScaler and confusion matrix heatmap.
Cells generated:
- Imports
- Load + shape, describe, isnull
- Drop columns cell
- Correlation heatmap
- X / y split
- train_test_split with
stratify=y - StandardScaler (fit only on train)
nb = GaussianNB()+ fit- Predict
- Accuracy
- Classification report
- Confusion matrix as
sns.heatmap
notebookpkg install naive-bayes --dataset Day5.csv --target Purchased
decision-tree
Decision Tree Classifier with tree visualization.
Cells generated:
- Imports (includes
from sklearn import tree) - Load + EDA
- Drop columns cell
- Distribution plot + heatmap + pairplot
- X / y split
- train_test_split
- StandardScaler
DecisionTreeClassifier(criterion='entropy', max_depth=5, random_state=0)- Predict
- Accuracy score
- Confusion matrix
- Classification report
tree.plot_tree()— full visual tree diagram
notebookpkg install decision-tree --dataset SNP.csv --target Purchased
svm-classifier
SVM with both Linear and RBF kernels, plus feature engineering.
Cells generated:
- Imports (includes
SVC) - Load + EDA (info, describe, isnull, value_counts)
- Drop columns cell
- Scatter plot of features
- X / y split
- train_test_split
- StandardScaler
model = SVC(kernel='linear')+ fit + predict + accuracy + CM + heatmap- Feature engineering:
df['AgeSalary'] = df['Age'] * df['EstimatedSalary'] - Re-split with new feature
model1 = SVC(kernel='rbf')+ fit + predict + accuracy + CM + heatmap
notebookpkg install svm-classifier --dataset SNP.csv --target Purchased
multi-model-compare
Runs Logistic Regression, KNN, and Naive Bayes on the same dataset and compares accuracy.
Cells generated:
- Imports
- Load + EDA
- Drop columns cell
- X / y split
- train_test_split
model_lr = LogisticRegression()→ fit → predict → accuracy → reportmodel_knn = KNeighborsClassifier()→ fit → predict → accuracy → reportmodel_nb = GaussianNB()→ fit → predict → accuracy → report- Comparison dict with all three accuracy scores printed together
notebookpkg install multi-model-compare --dataset Day5.csv --target Purchased
Ensemble Templates
random-forest-regressor
Random Forest Regressor with actual vs predicted scatter plot.
Cells generated:
- Imports
- Load + isnull, duplicated, info, describe
- Drop columns cell
- Correlation heatmap
- X / y split
- train_test_split (test_size=0.2, random_state=42)
RFR = RandomForestRegressor(n_estimators=100, random_state=42)+ fit- Predict
- MSE
- R²
- Scatter plot: Actual vs Predicted
notebookpkg install random-forest-regressor --dataset housing.csv --target Price
random-forest-classifier
Random Forest Classifier with feature importance bar chart.
Cells generated:
- Imports
- Load + EDA
- Drop columns cell
- X / y split
- train_test_split
- StandardScaler
model1 = RandomForestClassifier(n_estimators=100, random_state=42)+ fit- Predict
- Accuracy
- Classification report
- Confusion matrix heatmap
- Feature importance:
model1.feature_importances_ - Bar chart of feature importance
notebookpkg install random-forest-classifier --dataset iris.csv --target species
Clustering Templates
kmeans-clustering
KMeans Clustering with elbow method and silhouette score. No target column needed.
Cells generated:
- Imports (includes
KMeans,silhouette_score) - Load + shape, info, describe, isnull, duplicated
- Drop columns cell
- pairplot
- Correlation heatmap
- StandardScaler on numeric columns
- Elbow method loop (k=1 to 9) + inertia plot
KMeans(n_clusters=N)+ fit- Cluster labels added to df
- Cluster scatter plot with centroids marked in red
- Silhouette score
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv --clusters 5
The --drop Option
Many real datasets have ID columns, name columns, or other columns that should not go into the model.
Use --drop to remove them before anything is processed.
With --drop, the generated notebook gets:
df = df.drop(columns=['User ID', 'Gender'], axis=1)
df.head()
Without --drop, the cell appears as a comment so you can still do it manually:
# No columns dropped
# To drop columns use: df = df.drop(columns=['col1','col2'], axis=1)
The profiler also respects the drop — column detection for NUMERIC_COLS, CAT_COLS, and FEATURE_COLS all happen after the drop, so the rest of the notebook is consistent.
# Drop one column
notebookpkg install knn-classifier --dataset Day5.csv --target Purchased --drop "User ID"
# Drop multiple columns
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"
All Usage Examples
# ── EDA ──────────────────────────────────────────────────────────────────
notebookpkg install eda-basic --dataset data.csv
notebookpkg install eda-visual --dataset data.csv
notebookpkg install eda-full --dataset data.csv
# ── Regression ───────────────────────────────────────────────────────────
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary
notebookpkg install polynomial-regression --dataset hw.csv --target Price --degree 3
notebookpkg install lasso-ridge --dataset BostonHousing.csv --target medv
# ── Classification ────────────────────────────────────────────────────────
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased
notebookpkg install knn-classifier --dataset Day5.csv --target Purchased
notebookpkg install naive-bayes --dataset Day5.csv --target Purchased
notebookpkg install decision-tree --dataset SNP.csv --target Purchased
notebookpkg install svm-classifier --dataset SNP.csv --target Purchased
notebookpkg install multi-model-compare --dataset Day5.csv --target Purchased
# ── Ensemble ─────────────────────────────────────────────────────────────
notebookpkg install random-forest-regressor --dataset housing.csv --target Price
notebookpkg install random-forest-classifier --dataset iris.csv --target species
# ── Clustering ────────────────────────────────────────────────────────────
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv --clusters 5
# ── With drop ─────────────────────────────────────────────────────────────
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"
# ── Custom output filename ────────────────────────────────────────────────
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary --output my_analysis.ipynb
Project Structure
notebookpkg/
├── notebookpkg/
│ ├── cli.py # CLI commands: install, list
│ ├── profiler.py # Reads CSV, detects column types
│ ├── injector.py # Replaces tokens in notebook cells
│ ├── registry.py # Finds templates by name
│ └── templates/
│ ├── eda-basic/
│ ├── eda-visual/
│ ├── eda-full/
│ ├── linear-regression/
│ ├── polynomial-regression/
│ ├── logistic-regression/
│ ├── knn-classifier/
│ ├── naive-bayes/
│ ├── lasso-ridge/
│ ├── decision-tree/
│ ├── random-forest-regressor/
│ ├── random-forest-classifier/
│ ├── svm-classifier/
│ ├── kmeans-clustering/
│ └── multi-model-compare/
├── build_templates.py # Regenerates all .ipynb template files
├── setup.py
├── MANIFEST.in
└── README.md
Each template folder contains:
template.ipynb— the notebook with{{TOKEN}}placeholdersmeta.json— name, description, and whether a target column is needed
Dependencies
pandas
numpy
scikit-learn
matplotlib
seaborn
nbformat
click
These are installed automatically when you run pip install notebookpkg.
Author
Priyansu Pattanaik
B.Tech — Electronics & Telecommunication
PG Diploma in AI — CDAC Kharghar
priyansupattanaikwork@gmail.com
License
MIT License. Free to use, modify, and distribute.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file notebookpkg-1.3.0.tar.gz.
File metadata
- Download URL: notebookpkg-1.3.0.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78f3a1ea734a8c18216146fd6493aaded2d4342516a8b15cf63e10a908aceb6b
|
|
| MD5 |
00918a615dd54cba3b58b9a13379a12f
|
|
| BLAKE2b-256 |
c864776750b7725f9a9ccc08b7b0349aeafb77ad24eb0ab34e529bbed6b55015
|
File details
Details for the file notebookpkg-1.3.0-py3-none-any.whl.
File metadata
- Download URL: notebookpkg-1.3.0-py3-none-any.whl
- Upload date:
- Size: 31.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ec6fd2b2b0733792e2aece686a6e44e7e949107effa57395be70577dc1099b7
|
|
| MD5 |
ca952a47af9ab94028537f6abcf11b23
|
|
| BLAKE2b-256 |
bcaf41ce98e975f8dc1b7680b03708f65b43390649d81bd8cc8432d2a7629ae2
|