Skip to main content

edatk: python exploratory data analysis toolkit

Project description

edatk: Python Exploratory Data Analysis Toolkit

edatk is a open source project for exploratory data analysis in Python. This is a new project and while features are simple now, the goal is to automate and organize as much of the traditional eda workflow as possible.

Installation

pip install edatk

Examples and Getting Started

# Import library
import edatk as eda

# Load in your dataframe (using seaborn below as an example)
import seaborn as sns
df = sns.load_dataset('iris')

# Run auto eda, optionally pass in path for saving html report and target column
eda.auto_eda(df, save_path='C:\\Users\\username\\Documents\\edatk', target_column='species')

Feature Overview

Feature [status]

  • Tabular data [partial]
    • Column by column analysis [partial]
      • Basic descriptive statistics (mean, median, min, max, etc) [completed]
      • Distribution charts (numeric) and most frequent values (categorical) [completed]
      • Normality Tests [planned].
    • Relationships between columns [completed]
    • TSNE [planned]
    • Basic feature -> target analysis and feature importance [planned]
    • Autofind interesting relationships and features [planned]
    • Basic exploratory NLP for text columns [planned]
  • Exploring Predicted vs. True Results [planned]
    • Classification Results Plots
      • True vs. Predicted Heatmap by Class
      • Mosiac Plot
  • Time Series [planned]
  • Performance Improvements [planned]
    • Operation timeouts

Contributing

If you are interested in contributing, please see the contributing documentation.

Stability

This library is not yet ready for production use. Treat with caution and for non production purposes aiding in deeper, more formal data analysis.

Author

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edatk-0.0.8.tar.gz (19.8 kB view hashes)

Uploaded Source

Built Distribution

edatk-0.0.8-py3-none-any.whl (34.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page