Skip to main content

Interactive EDA visualizations in your jupyter notebook

Project description

intedact: Interactive EDA

Read the Docs License

Interactive EDA for pandas DataFrames directly in your Jupyter notebook. intedact's goal is to make common, standardized EDA summaries available with one function call. Instead of copying and pasting the same code across projects, just use intedact in any initial exploration notebook and explore the dataset right in the notebook. intedact is interactive so you can adjust these summaries as needed for different datasets, but it strives to produce the best summaries by default.

Full documentation at intedact.readthedocs.io

Try intedact here: Binder

Getting Started

Installation

Install via pip:

pip install intedact

Download the following nltk resources for the ngram text summaries.

import nltk
nltk.download('punkt')
nltk.download('stopwords')

Univariate EDA

Univariate EDA refers to the process of visualizing and summarizing a single variable. intedact's univariate EDA allows you to produce summaries of single columns in a pandas dataframe

For interactive univariate EDA simply import the univariate_eda_interact function in a jupyter notebook:

from intedact import univariate_eda_interact
univarate_eda_interact(
    data, notes_file="optional_file_to_save_notes_to.json", figure_dir="optional_directory_to_save_plots_to"
)

At the top level, one selects the column and the summary type for that column to display. To explore the full dataset, just toggle through each of the column names. Current supported summary types:

  • categorical: Summarize a categorical or low cardinality numerical column
  • numeric: Summarize a high cardinality numerical column
  • datetime: Summarize a datetime column
  • text: Summarize a free form text column
  • collection: Summarize a column with collections of values (i.e. lists, tuples, sets, etc.)
  • url: Summarize a column containing urls

For each column, one can then adjust parameters for the given summary type to fit your particular dataset. These summaries try to automatically set good default parameters, but sometimes you need to make adjustments to get the full picture.

See the documentation for examples of how to statically call the individual univariate summary functions.

Bivariate EDA

Work in progress. Stay tuned!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intedact-0.0.1.tar.gz (28.4 kB view hashes)

Uploaded Source

Built Distribution

intedact-0.0.1-py3-none-any.whl (34.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page