Skip to main content

Library that provides helperfunctions for data science preprocessing and exploratory data analysis.

Project description

jan883-eda

A collection of utility functions for data analysis, preprocessing, model evaluation, and clustering in Python. Designed to streamline the workflow of data scientists and machine learning practitioners.

Installation

Install the package via pip:

pip install jan883-eda

Usage

Below are examples demonstrating how to use some of the key functions in the package. These examples assume you have a DataFrame (your_dataframe) or feature matrix (X) and target vector (y) ready.

Exploratory Data Analysis (EDA)

  • Inspect DataFrame:
from jan883_eda import inspect_df

inspect_df(your_dataframe)

This displays the head, shape, description, NaN values, and duplicates of the DataFrame.

  • Column Summary:
from jan883_eda import column_summary

summary = column_summary(your_dataframe)
print(summary)

Data Preprocessing

  • Update Column Names:
from jan883_eda import update_column_names

updated_df = update_column_names(your_dataframe)
  • Label Encoding:
from jan883_eda import label_encode_column

encoded_df = label_encode_column(your_dataframe, 'column_name')

Model Evaluation

  • Evaluate Classification Model:
from jan883_eda import evaluate_classification_model
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
evaluate_classification_model(model, X, y)
  • Test Multiple Regression Models:
from jan883_eda import best_regression_models

results = best_regression_models(X, y)
print(results)

Functions Overview

The package provides a variety of functions grouped by their purpose:

  • EDA Functions: inspect_df, column_summary, univariate_analysis, and more.
  • Data Preprocessing: update_column_names, label_encode_column, one_hot_encode_column, scale_X_train_X_test, and more.
  • Model Evaluation: evaluate_classification_model, evaluate_regression_model, best_classification_models, best_regression_models, and more.
  • Clustering Analysis: plot_elbow_method, plot_intercluster_distance, plot_silhouette_visualizer, and more.

For a complete list of functions and their detailed documentation, refer to the docstrings within the source code or the official documentation.

Requirements

The following dependencies are required to use the package:

  • Python >= 3.6
  • pandas >= 1.0.0
  • numpy >= 1.18.0
  • matplotlib >= 3.0.0
  • seaborn >= 0.10.0
  • scikit-learn >= 0.22.0
  • yellowbrick >= 1.0.0
  • imblearn >= 0.7.0

These will be automatically installed when you install the package via pip, assuming the package is properly configured with a setup.py or pyproject.toml file.

License

This package is distributed under the MIT License. See the LICENSE file for more information.

Contact

For questions, bug reports, or contributions, please visit the GitHub repository or contact the author at email@example.com.


This README.md provides a clear and concise overview of the package, including its purpose, installation instructions, usage examples, function categories, dependencies, licensing, and contact information, making it suitable for PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jan883_eda-0.2.1.tar.gz (57.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jan883_eda-0.2.1-py2.py3-none-any.whl (24.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file jan883_eda-0.2.1.tar.gz.

File metadata

  • Download URL: jan883_eda-0.2.1.tar.gz
  • Upload date:
  • Size: 57.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for jan883_eda-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8dd8d349ecb775405f8bebcde66671fb760ab93796da1c86a20ec31203c382dc
MD5 e4f8d576d3db7b4151cb4e9463f17806
BLAKE2b-256 f32522bcdad565bed67772d45337539eacb442078522a9b21f5239d04132aeda

See more details on using hashes here.

File details

Details for the file jan883_eda-0.2.1-py2.py3-none-any.whl.

File metadata

  • Download URL: jan883_eda-0.2.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for jan883_eda-0.2.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 04555be2a0c59e57e41cfc45f18886c7c5c3ec90d2824f1f75b0f53e1a466184
MD5 77e098c4ece1e3134a362792661fe10f
BLAKE2b-256 7d120542c9e4e9f81f5f3342f719d576aaf7632caa4a5b174fa86bec7e6c847e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page