Skip to main content

AutoML with enhanced preprocessing and explainability.

Project description

Auto-Prep

Auto-Prep is an automated data preprocessing and analysis pipeline that generates comprehensive LaTeX reports. It handles common preprocessing tasks, creates insightful visualizations, and documents the entire process in a professional PDF report. It focuses on tabular data, supporting numerous explainable AI models. Emphasizing interpretability and ease of use, it includes subsections for each model, explaining their strengths, weaknesses, and providing usage examples.

For detailed product description see this notebook

Docs

Features

  • Automated data cleaning and preprocessing
  • Intelligent feature type detection
  • Advanced categorical encoding with rare category handling
  • Comprehensive exploratory data analysis (EDA)
  • Automated visualization generation
  • Professional LaTeX report generation
  • Modular and extensible design
  • Support for numerous explainable ML models
  • Explainability with model-specific examples

Report Contents

The generated report includes:

  1. Title page and table of contents
  2. Overview
    • Platform structure
    • Dataset structure
  3. Exploratory Data Analysis
    • Distribution plots
    • Correlation matrix
    • Missing value analysis
  4. Model Performance
    • Accuracy metrics
    • Model details

Installation

In order to use our tool, you need to have latex intalled on your local machine.

Using pip (Recommended)

  1. Install Auto-Prep directly from PyPI:

    pip install auto-prep
    
  2. Run the example usage:

    python example_usage.py
    

Using Poetry

  1. Ensure you have Poetry installed:

    curl -sSL https://install.python-poetry.org | python3 -
    
  2. Clone the repository:

    git clone https://github.com/yourusername/auto-prep.git
    cd auto-prep
    
  3. Install dependencies:

    poetry install
    
  4. Activate the virtual environment:

    poetry shell
    
  5. Run the example usage:

    python example_usage.py
    

Important informations

  • due to multiprocessing enabled, run method is recommended to be called under name main check - see example in next point. Number of cores used can be set in config.

  • difference between config.set vs config.update - first one can be used to see default values for each setting, and it will overwritte all non-passed values to their defaults. Second option will just overwritte provied arguments without validation, can be used to create new fields in config.

  • config.root_dir if exists is cleared on call of AutoPrep().run(). If logs are pointed to be stored there, it will delete their file handlers causing errors.

  • logs returned to console might be very unreadable due to many warnings in dependencies. Please refer to stored log files for clean logs.

  • for changes in config to be loaded, config.update must be called before any other import from autoprep package - as example:

    import logging
    from auto_prep.utils import config
    
    config.update(log_level=logging.DEBUG)
    
    import numpy as np
    
    from auto_prep.prep import AutoPrep
    from sklearn.datasets import fetch_openml
    
    # Load your dataset
    data = fetch_openml(name="titanic", version=1, as_frame=True, parser="auto").frame
    data["survived"] = data["survived"].astype(np.uint8)
    
    # Create and run pipeline
    pipeline = AutoPrep()
    
    if __name__ == "__main__":
        pipeline.run(data, target_column="survived")
    

    For same reason AutoPrep is not exported to top-level package. It is known implementation fault.

Examples

Refer to this folder.

Author

  • Paweł Pozorski - GitHub
  • Katarzyna Rogalska
  • Julia Kruk
  • Gaspar Sekula

Notes for Developers

  1. Poetry is used for dependency management and virtual environments. The following functions are implemented:
    • poetry run format - Format code
    • poetry run lint - Lint code
    • poetry run check - Check code
    • poetry run test - Run tests
    • poetry run pre-commit run --all-files - Run pre-commit hooks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_prep-0.2.3.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auto_prep-0.2.3-py3-none-any.whl (67.7 kB view details)

Uploaded Python 3

File details

Details for the file auto_prep-0.2.3.tar.gz.

File metadata

  • Download URL: auto_prep-0.2.3.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: poetry/2.0.1 CPython/3.10.16 Linux/6.5.0-1025-azure

File hashes

Hashes for auto_prep-0.2.3.tar.gz
Algorithm Hash digest
SHA256 ed38c1b5c775c23eccb884cfb92b0b469f0fc318f2a1a11224b7d02cd42f8f39
MD5 10ec1c1fc65ca6a2e83eb2e0ef2dc502
BLAKE2b-256 61a9dcb43a2e24c159dc8fb66e9b59f529ff63e341d612cc6e013d5ffa39750a

See more details on using hashes here.

File details

Details for the file auto_prep-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: auto_prep-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 67.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: poetry/2.0.1 CPython/3.10.16 Linux/6.5.0-1025-azure

File hashes

Hashes for auto_prep-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4e17604708c3c62fe67efaecc9be941d3ed7c545a07e5649dad94842d0667331
MD5 ef4219620f441d995d70369c4020256b
BLAKE2b-256 eefa036d59601aca3f5ad3faf9fb24cc2ce7cfb241062b95b8ce380002da9c73

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page