Skip to main content

A package for simplifying the EDA of different data types!

Project description

pyxplor

Documentation Status License: MIT ci-cd Project Status: Active – The project has reached a stable, usable state and is being actively developed. codecov

version release Python 3.12.0 PyPI - Version

A package for simplifying the EDA of different data types!

About

pyxplor is a comprehensive Python package designed to automate and streamline the Exploratory Data Analysis (EDA) process. Tailored for various data types including numeric, categorical, binary, and time series data, pyxplor aims to enhance data interpretation through a suite of specialized plotting functions. This package seeks to reduce the complexity and time invested in initial data analysis, making it an essential tool for data scientists and analysts at all levels.

Documentation

The online documentation can be accessed here.

Installation

User Installation

Run the following code in your terminal to install the package from PyPI:

pip install pyxplor

Developer Installation

  1. Clone the repository.
git clone https://github.com/UBC-MDS/PyXplor.git
cd pyxplor
  1. Create an environment with conda and then activate the environment.
conda create -n pyxplor python=3.12 -y
conda activate pyxplor
  1. Install poetry inside the environment.
conda install poetry
  1. Run poetry install.
poetry install

Testing

To test that the functions are working properly, run the commands below from the root directory.

pytest tests/

To see the coverage of the tests, run the commands below instead.

pytest tests/ --cov=pyxplor

Usage

The functions in pyxplor are very simple to use. Below is a simple demonstration:

from pyxplor.plot_binary import plot_binary
from pyxplor.plot_categorical import plot_categorical
from pyxplor.plot_numeric import plot_numeric
from pyxplor.plot_time_series import plot_time_series
import seaborn as sns
import pandas as pd

# a dataframe that contains different types of variables
taxi = sns.load_dataset("taxis")
taxi = taxi.dropna()

# different variable types
binary_variables = ['color', 'payment']
categorical_variables = ['passengers', 'pickup_zone']
numeric_variables = ['fare', 'tip']
datetime_variable = 'pickup'

# univariate plotting each of the variable type
fig, ax = plot_binary(taxi, binary_variables, "count")
fig, ax = plot_categorical(taxi, categorical_variables)
fig, ax = plot_numeric(taxi, numeric_variables, "hist+kde")
fig, ax = plot_time_series(taxi, datetime_variable, numeric_variables, freq='M')

Functions

  • plot_numeric(input_df, list_of_variables, ...): Plots the distribution of numeric variables in a DataFrame, offering options for histograms, KDE plots, or a combination of both.
  • plot_categorical(input_df, list_of_variables, ...): Visualizes categorical data by creating bar plots for each categorical variable specified, aiding in understanding frequency distributions.
  • plot_binary(input_df, list_of_variables, ...): Generates plots for binary variables, either as bar plots or pie charts, to highlight distributions and potential imbalances.
  • plot_time_series(input_df, date_column, value_columns, ...): Specialized in time-series analysis, this function creates line plots for multiple time series variables, allowing for trend analysis and comparison.

PyXplor Use in Python Ecosystem

While there are several EDA packages in the Python ecosystem, such as pandas-profiling (link) and sweetviz (link), pyxplor differentiates itself by offering specialized functions for different data types. This targeted approach enables more nuanced and relevant insights, particularly for binary and time-series data which are often less catered for in existing tools. pyxplor complements these existing tools by filling these specific gaps, thus enriching the Python EDA toolkit.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Contributors

  • Po-Hsun (Ben) Chen (@phchen5)
  • Rachel Bouwer (@rbouwer)
  • Arturo Boquin (@arturoboquin)
  • Iris Luo (@luonianyi)

License

pyxplor was created by Ben Chen, Rachel Bouwer, Arturo Boquin, and Iris Luo. It is licensed under the terms of the MIT license.

Credits

pyxplor was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyxplor-2.0.11.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

pyxplor-2.0.11-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file pyxplor-2.0.11.tar.gz.

File metadata

  • Download URL: pyxplor-2.0.11.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for pyxplor-2.0.11.tar.gz
Algorithm Hash digest
SHA256 def1422418932db1d65f3bfc37cf0bc29a5afe8a109bdc297107cdbddd84cd8e
MD5 5d374c3bf2979ea4f3d2a7f8f45de5b5
BLAKE2b-256 53c906616c4e6d034fe00c4c794d0eb226b2be5eaf62ba7998d1c62e3d409284

See more details on using hashes here.

File details

Details for the file pyxplor-2.0.11-py3-none-any.whl.

File metadata

  • Download URL: pyxplor-2.0.11-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for pyxplor-2.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 8252080de32dcb67d8747b0d9607983f991cfacd01c4298f8c021917b44f2258
MD5 ac43432f351f4f0faa10d7c88fee05b7
BLAKE2b-256 957207baadd6ad09cf7bedbe2cc710674bfbafc72475fbbf4f737f7fc403d02e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page