Interactive EDA visualizations in your jupyter notebook
Project description
intedact: Interactive EDA
Interactive EDA for pandas DataFrames directly in your Jupyter notebook. intedact's goal is to make common, standardized EDA summaries available with one function call. Instead of copying and pasting the same code across projects, just use intedact in any initial exploration notebook and explore the dataset right in the notebook. intedact is interactive so you can adjust these summaries as needed for different datasets, but it strives to produce the best summaries by default.
Full documentation at intedact.readthedocs.io
Getting Started
Installation
Install via pip:
pip install intedact
Download the following nltk resources for the ngram text summaries.
import nltk
nltk.download('punkt')
nltk.download('stopwords')
Univariate EDA
Univariate EDA refers to the process of visualizing and summarizing a single variable. intedact's univariate EDA allows you to produce summaries of single columns in a pandas dataframe
For interactive univariate EDA simply import the univariate_eda_interact
function in a jupyter notebook:
from intedact import univariate_eda_interact
univarate_eda_interact(
data, notes_file="optional_file_to_save_notes_to.json", figure_dir="optional_directory_to_save_plots_to"
)
At the top level, one selects the column and the summary type for that column to display. To explore the full dataset, just toggle through each of the column names. Current supported summary types:
- categorical: Summarize a categorical or low cardinality numerical column
- numeric: Summarize a high cardinality numerical column
- datetime: Summarize a datetime column
- text: Summarize a free form text column
- collection: Summarize a column with collections of values (i.e. lists, tuples, sets, etc.)
- url: Summarize a column containing urls
For each column, one can then adjust parameters for the given summary type to fit your particular dataset. These summaries try to automatically set good default parameters, but sometimes you need to make adjustments to get the full picture.
See the documentation for examples of how to statically call the individual univariate summary functions.
Bivariate EDA
Work in progress. Stay tuned!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.