AutoML with enhanced preprocessing and explainability.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Auto-Prep

Auto-Prep is an automated data preprocessing and analysis pipeline that generates comprehensive LaTeX reports. It handles common preprocessing tasks, creates insightful visualizations, and documents the entire process in a professional PDF report. It focuses on tabular data, supporting numerous explainable AI models. Emphasizing interpretability and ease of use, it includes subsections for each model, explaining their strengths, weaknesses, and providing usage examples.

For detailed product description see this notebook

Docs

Features

Automated data cleaning and preprocessing
Intelligent feature type detection
Advanced categorical encoding with rare category handling
Comprehensive exploratory data analysis (EDA)
Automated visualization generation
Professional LaTeX report generation
Modular and extensible design
Support for numerous explainable ML models
Explainability with model-specific examples

Report Contents

The generated report includes:

Title page and table of contents
Overview
- Platform structure
- Dataset structure
Exploratory Data Analysis
- Distribution plots
- Correlation matrix
- Missing value analysis
Model Performance
- Accuracy metrics
- Model details

Installation

In order to use our tool, you need to have latex intalled on your local machine.

Using pip (Recommended)

Install Auto-Prep directly from PyPI:
```
pip install auto-prep
```
Run the example usage:
```
python example_usage.py
```

Using Poetry

Ensure you have Poetry installed:

curl -sSL https://install.python-poetry.org | python3 -

Clone the repository:

git clone https://github.com/yourusername/auto-prep.git
cd auto-prep

Install dependencies:
```
poetry install
```
Activate the virtual environment:
```
poetry shell
```
Run the example usage:
```
python example_usage.py
```

Important informations

due to multiprocessing enabled, run method is recommended to be called under name main check - see example in next point. Number of cores used can be set in config.
difference between config.set vs config.update - first one can be used to see default values for each setting, and it will overwritte all non-passed values to their defaults. Second option will just overwritte provied arguments without validation, can be used to create new fields in config.
config.root_dir if exists is cleared on call of AutoPrep().run(). If logs are pointed to be stored there, it will delete their file handlers causing errors.
logs returned to console might be very unreadable due to many warnings in dependencies. Please refer to stored log files for clean logs.

for changes in config to be loaded, config.update must be called before any other import from autoprep package - as example:

import logging
from auto_prep.utils import config

config.update(log_level=logging.DEBUG)

import numpy as np

from auto_prep.prep import AutoPrep
from sklearn.datasets import fetch_openml

# Load your dataset
data = fetch_openml(name="titanic", version=1, as_frame=True, parser="auto").frame
data["survived"] = data["survived"].astype(np.uint8)

# Create and run pipeline
pipeline = AutoPrep()

if __name__ == "__main__":
    pipeline.run(data, target_column="survived")

For same reason AutoPrep is not exported to top-level package. It is known implementation fault.

Examples

Refer to this folder.

Author

Paweł Pozorski - GitHub
Katarzyna Rogalska
Julia Kruk
Gaspar Sekula

Notes for Developers

Poetry is used for dependency management and virtual environments. The following functions are implemented:
- poetry run format - Format code
- poetry run lint - Lint code
- poetry run check - Check code
- poetry run test - Run tests
- poetry run pre-commit run --all-files - Run pre-commit hooks

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Pawlo_77

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.3

Jan 14, 2025

0.2.2

Jan 12, 2025

0.2.1

Jan 12, 2025

0.1.1

Dec 18, 2024

0.1.0

Dec 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_prep-0.2.3.tar.gz (45.9 kB view details)

Uploaded Jan 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

auto_prep-0.2.3-py3-none-any.whl (67.7 kB view details)

Uploaded Jan 14, 2025 Python 3

File details

Details for the file auto_prep-0.2.3.tar.gz.

File metadata

Download URL: auto_prep-0.2.3.tar.gz
Upload date: Jan 14, 2025
Size: 45.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: poetry/2.0.1 CPython/3.10.16 Linux/6.5.0-1025-azure

File hashes

Hashes for auto_prep-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`ed38c1b5c775c23eccb884cfb92b0b469f0fc318f2a1a11224b7d02cd42f8f39`
MD5	`10ec1c1fc65ca6a2e83eb2e0ef2dc502`
BLAKE2b-256	`61a9dcb43a2e24c159dc8fb66e9b59f529ff63e341d612cc6e013d5ffa39750a`

See more details on using hashes here.

File details

Details for the file auto_prep-0.2.3-py3-none-any.whl.

File metadata

Download URL: auto_prep-0.2.3-py3-none-any.whl
Upload date: Jan 14, 2025
Size: 67.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: poetry/2.0.1 CPython/3.10.16 Linux/6.5.0-1025-azure

File hashes

Hashes for auto_prep-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4e17604708c3c62fe67efaecc9be941d3ed7c545a07e5649dad94842d0667331`
MD5	`ef4219620f441d995d70369c4020256b`
BLAKE2b-256	`eefa036d59601aca3f5ad3faf9fb24cc2ce7cfb241062b95b8ce380002da9c73`

See more details on using hashes here.

auto-prep 0.2.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Auto-Prep

Docs

Features

Report Contents

Installation

Using pip (Recommended)

Using Poetry

Important informations

Examples

Author

Notes for Developers

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes