Library that provides helperfunctions for data science preprocessing and exploratory data analysis.

These details have not been verified by PyPI

Project links

Project description

jan883-eda

A collection of utility functions for data analysis, preprocessing, model evaluation, and clustering in Python. Designed to streamline the workflow of data scientists and machine learning practitioners.

Installation

Install the package via pip:

pip install jan883-eda

For local development from this repository:

uv sync
uv run python -c "import jan883_eda; print(jan883_eda.__all__)"

Usage

Below are examples demonstrating how to use some of the key functions in the package. These examples assume you have a DataFrame (your_dataframe) or feature matrix (X) and target vector (y) ready.

Exploratory Data Analysis (EDA)

Inspect DataFrame:

from jan883_eda import inspect_df

inspect_df(your_dataframe)

This displays the head, shape, description, NaN values, and duplicates of the DataFrame.

Column Summary:

from jan883_eda import column_summary

summary = column_summary(your_dataframe)
print(summary)

Data Quality Report:

from jan883_eda import data_quality_report

quality = data_quality_report(your_dataframe)
print(quality)

Data Preprocessing

Update Column Names:

from jan883_eda import update_column_names

updated_df = update_column_names(your_dataframe)

Label Encoding:

from jan883_eda import label_encode_column

encoded_df = label_encode_column(your_dataframe, 'column_name')

Train-Test Safe Preprocessor:

from jan883_eda import fit_transform_preprocessor

preprocessor, X_train_ready, X_test_ready = fit_transform_preprocessor(X_train, X_test)

Model Evaluation

Evaluate Classification Model:

from jan883_eda import evaluate_classification_model
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
evaluate_classification_model(model, X, y)

Test Multiple Regression Models:

from jan883_eda import best_regression_models

results = best_regression_models(X, y)
print(results)

Cross-Validated Model Comparison:

from jan883_eda import compare_classifiers_cv, compare_regressors_cv

classification_results = compare_classifiers_cv(X, y, scoring="f1_weighted")
regression_results = compare_regressors_cv(X, y, scoring="r2")

Diagnostics

Classification and Regression Diagnostics:

from jan883_eda import (
    class_balance_report,
    classification_metrics_table,
    plot_confusion_matrix,
    regression_metrics,
    plot_regression_diagnostics,
)

balance = class_balance_report(y)
metrics = classification_metrics_table(y_test, y_pred)
plot_confusion_matrix(y_test, y_pred)
regression_summary = regression_metrics(y_test, y_pred)
residuals = plot_regression_diagnostics(y_test, y_pred)

Feature Selection

Feature Ranking and Pruning:

from jan883_eda import (
    low_variance_features,
    correlation_prune,
    mutual_information_ranking,
    permutation_importance_table,
)

low_variance = low_variance_features(X)
correlated = correlation_prune(X, threshold=0.9)
mi_scores = mutual_information_ranking(X, y, problem_type="classification")
importance = permutation_importance_table(fitted_model, X_test, y_test)

Clustering

Evaluate and Profile Clusters:

from jan883_eda import evaluate_kmeans_clusters, cluster_profile, pca_cluster_projection

k_scores = evaluate_kmeans_clusters(X_scaled, k_range=range(2, 10))
profiles = cluster_profile(your_dataframe, labels)
projection = pca_cluster_projection(X_scaled, labels)

Time Series

Analyze Stationarity:

from jan883_eda import analyze_stationarity

stationary_series = your_time_series.diff().dropna()
analyze_stationarity(stationary_series, alpha=0.05, lags=15)

This runs an Augmented Dickey-Fuller test, prints a plain-English stationarity interpretation, and plots ACF/PACF charts to help inspect autoregressive and moving-average structure.

Forecasting Helpers:

from jan883_eda import (
    stationarity_report,
    plot_rolling_statistics,
    seasonal_decomposition_plot,
    make_lag_features,
    time_series_train_test_split,
    forecast_metrics,
)

report = stationarity_report(your_time_series)
rolling = plot_rolling_statistics(your_time_series, window=12)
decomposition = seasonal_decomposition_plot(your_time_series, period=12)
lagged = make_lag_features(your_time_series, lags=(1, 2, 3), rolling_windows=(7, 14))
train_ts, test_ts = time_series_train_test_split(lagged, test_size=0.2)
scores = forecast_metrics(y_true, y_pred)

Drift and Pipelines

Train-Test Drift and Production Pipelines:

from jan883_eda import (
    compare_train_test_distributions,
    build_model_pipeline,
    validate_prediction_columns,
    save_pipeline,
    load_pipeline,
)

drift = compare_train_test_distributions(X_train, X_test)
pipeline = build_model_pipeline(X_train, estimator)
pipeline.fit(X_train, y_train)
validated = validate_prediction_columns(new_data, X_train.columns)
save_pipeline(pipeline, "model.joblib")
loaded_pipeline = load_pipeline("model.joblib")

Functions Overview

The package provides a variety of functions grouped by their purpose:

EDA Functions: inspect_df, column_summary, univariate_analysis, and more.
Data Quality: data_quality_report, duplicate_summary.
Data Preprocessing: update_column_names, label_encode_column, one_hot_encode_column, build_preprocessor, fit_transform_preprocessor, and more.
Model Evaluation: evaluate_classification_model, evaluate_regression_model, best_classification_models, best_regression_models, compare_classifiers_cv, compare_regressors_cv, and more.
Diagnostics: class_balance_report, classification_metrics_table, plot_confusion_matrix, regression_metrics, plot_regression_diagnostics, and more.
Feature Selection: low_variance_features, correlation_prune, mutual_information_ranking, permutation_importance_table.
Clustering Analysis: plot_elbow_method, plot_intercluster_distance, plot_silhouette_visualizer, evaluate_kmeans_clusters, cluster_profile, and more.
Time Series: analyze_stationarity, stationarity_report, make_lag_features, forecast_metrics, and more.
Drift and Pipelines: compare_train_test_distributions, population_stability_index, build_model_pipeline, save_pipeline, load_pipeline.

For a complete list of functions and their detailed documentation, refer to the docstrings within the source code.

Requirements

The following dependencies are required to use the package:

Python >= 3.12
pandas >= 2.2.3
numpy >= 2.2.4
matplotlib >= 3.10.1
seaborn >= 0.13.2
scikit-learn >= 1.6.1
setuptools >= 69
statsmodels >= 0.14.4
yellowbrick >= 1.5
imbalanced-learn >= 0.13.0
xgboost >= 3.0.0

These are installed automatically when you install the package with pip.

License

This package is distributed under the MIT License.

Contact

For questions, bug reports, or contributions, use the project repository where this package is maintained.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.2

May 5, 2026

0.2.1

Dec 19, 2025

0.2.0

Oct 9, 2025

0.1.8

Mar 28, 2025

0.1.7

Mar 28, 2025

0.1.6

Mar 24, 2025

0.1.5

Mar 24, 2025

0.1.4

Mar 24, 2025

0.1.3

Mar 24, 2025

0.1.2

Mar 24, 2025

0.1.1

Mar 24, 2025

0.1.0

Mar 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jan883_eda-0.2.2.tar.gz (67.9 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jan883_eda-0.2.2-py3-none-any.whl (38.0 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file jan883_eda-0.2.2.tar.gz.

File metadata

Download URL: jan883_eda-0.2.2.tar.gz
Upload date: May 5, 2026
Size: 67.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for jan883_eda-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`714912edd16970ccd1ac5562578ddbf8c72bde14cb530675a84768b2b7099111`
MD5	`21e080a57ee25a3f8273084432b3f665`
BLAKE2b-256	`53698ee8a2c3b73597ab012baf0d62de587cec89091da0c632591823fc29dbe2`

See more details on using hashes here.

File details

Details for the file jan883_eda-0.2.2-py3-none-any.whl.

File metadata

Download URL: jan883_eda-0.2.2-py3-none-any.whl
Upload date: May 5, 2026
Size: 38.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for jan883_eda-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f893782c5025766d5a7693dd26d132b422d09a7275d41fdcadbf09dd53eacc7a`
MD5	`441cfb59db37f71a41a16daac3d9aadb`
BLAKE2b-256	`59515aa5e5cd4e507604ed97bf9f70622e3ff748e9266c099362ca806ff65e9f`

See more details on using hashes here.

jan883-eda 0.2.2

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

jan883-eda

Installation

Usage

Exploratory Data Analysis (EDA)

Data Preprocessing

Model Evaluation

Diagnostics

Feature Selection

Clustering

Time Series

Drift and Pipelines

Functions Overview

Requirements

License

Contact

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes