Custom data science utilities for model evaluation and data preparation
Project description
Shash Package
A custom Python package for data preparation, exploration, splitting, saving/loading datasets, and model evaluation (classification & regression).
✨ Features
🔹 Data Preparation & EDA (dataprep.py)
datacheck(df)Checks for missing/null values, unique counts, and duplicate rows in a DataFrame.dataeda(df)Prints dataset overview: head, shape, info, numerical & categorical statistics.auto_convert_dates(df)Automatically converts date-like object/string columns to datetime.
🔹 Dataset Splitting & Storage (modelprep.py)
split_sets(features, target, test_val_ratio=0.3, stratify=False)Splits into Train, Validation, and Test sets (with optional stratification).save_sets_csv(...)Saves splits into CSV files (../data/processed/by default).load_sets_csv(...)Loads Train/Val/Test sets from CSV files.
🔹 Model Evaluation (evaluation.py)
Classification
evaluate_classifier(y_true, y_pred_labels, y_pred_proba=None, dataset_name="Dataset")Prints Accuracy, Precision, Recall, F1, ROC AUC (if probs available), classification report, and displays confusion matrix.
Regression
evaluate_regressor(y_true, y_pred, dataset_name="Dataset")Prints MAE, MSE, RMSE, MAPE, R², and displays residuals & true-vs-predicted plots.
🔹 Model Runner Wrappers (model_runner.py)
fit_eval_classifier(model, X_train, y_train, X_val=None, y_val=None, X_test=None, y_test=None)Fits a classifier and evaluates on Train/Val/Test usingevaluate_classifier.fit_eval_regressor(model, X_train, y_train, X_val=None, y_val=None, X_test=None, y_test=None)Fits a regressor and evaluates on Train/Val/Test usingevaluate_regressor.
🚀 Installation
Install from PyPI (after publishing):
pip install shash
Or install locally for development:
pip install -e .
📌 Usage Examples
Data Preparation
import pandas as pd
from shash.dataprep import datacheck, dataeda, auto_convert_dates
df = pd.read_csv("data/raw/sample.csv")
# Quick checks
print(datacheck(df))
dataeda(df)
# Convert string dates automatically
df = auto_convert_dates(df)
Dataset Splitting
from shash.modelprep import split_sets, save_sets_csv, load_sets_csv
X_train, y_train, X_val, y_val, X_test, y_test = split_sets(features, target, stratify=True)
save_sets_csv(X_train, y_train, X_val, y_val, X_test, y_test)
# Later...
X_train, y_train, X_val, y_val, X_test, y_test = load_sets_csv()
Model Evaluation
from shash.evaluation import evaluate_classifier, evaluate_regressor
from sklearn.linear_model import LogisticRegression, LinearRegression
# Classification
clf = LogisticRegression()
clf.fit(X_train, y_train)
evaluate_classifier(y_val, clf.predict(X_val), clf.predict_proba(X_val)[:,1], "Validation")
# Regression
reg = LinearRegression()
reg.fit(X_train, y_train)
evaluate_regressor(y_val, reg.predict(X_val), "Validation")
Model Runner
from shash.model_runner import fit_eval_classifier, fit_eval_regressor
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
clf = RandomForestClassifier()
fit_eval_classifier(clf, X_train, y_train, X_val, y_val, X_test, y_test)
reg = RandomForestRegressor()
fit_eval_regressor(reg, X_train, y_train, X_val, y_val, X_test, y_test)
✅ Tests
All tests are written with pytest. Run them with:
poetry run pytest -v
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shash-0.2.0.tar.gz.
File metadata
- Download URL: shash-0.2.0.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.11.4 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48ef0fa287c5f6a85132edbcbf24aaab2f09c1857f3a340f309e6b7330c3fb52
|
|
| MD5 |
15d9a562b529c124d8f65609dec8c464
|
|
| BLAKE2b-256 |
45eb87fd37ef550620b2c09f399858638ca245b257c0f75aef1494cc6e510fff
|
File details
Details for the file shash-0.2.0-py3-none-any.whl.
File metadata
- Download URL: shash-0.2.0-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.11.4 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5c3fae61dd7ada00d89d05f1649515ca28fa18790c1e2a45fb4a2c57f26670f
|
|
| MD5 |
30523a60fb6cc584961719187f3ace1d
|
|
| BLAKE2b-256 |
c7314e86ebd4392e884247887077a7a048de6734ff770f60b23127ffd80aef91
|