Quick, easy and customizable data analysis with pandas and seaborn
Project description
EDAwesome
This is a package for quick, easy and customizable data analysis with pandas and seaborn. We automate all the routine so you can focus on the data. Clear and cuztomizable workflow is proposed.
Installation
EDAwesome is generally compatible with standard Anaconda environment in therms of dependencies. So, you can just install it in you environment with pip:
pip install edawesome
You can also install the dependencies, using requirements.txt
:
pip install -r requirements.txt
If you use Poetry, just include the depedencies in your pyproject.toml
:
[tool.poetry.dependencies]
python = ">=3.8,<3.11"
seaborn = "^0.12.1"
kaggle = "^1.5.12"
ipython = "^8.5.0"
transitions = "^0.9.0"
patool = "^1.12"
pyspark = "^3.3.1"
pandas = "^1.5.2"
statsmodels = "^0.13.5"
scikit-learn = ">=1.2.0"
scipy = "~1.8.0"
Usage
This package is designed to be used in Jupyter Notebook. You can use step-by-step workflow or just import the functions you need. Below is the example of the step-by-step workflow:
Quick start
from edawesome.eda import EDA
eda = EDA(
data_dir_path='/home/dreamtim/Desktop/Coding/turing-ds/MachineLearning/tiryko-ML1.4/data',
archives=['/home/dreamtim//Downloads/home-credit-default-risk.zip'],
use_pyspark=True,
pandas_mem_limit=1024**2,
pyspark_mem_limit='4g'
)
This will create the EDA
object. Now you can load the data into your EDA:
eda.load_data()
This will display the dataframes and their shapes. You can also use eda.dataframes
to see the dataframes. Now you can go to the next step:
eda.next()
eda.clean_check()
Let us say, that we don't want to do any cleaning in this case. So, we just go to the next step:
eda.next()
eda.categorize()
Now you can compare some numerical column by category just in one line:
eda.compare_distributions('application_train', 'ext_source_3', 'target')
Real-world example
Full notebook which was used for examples above can be found in one of my real ML projects.
There is also an example quickstart.ipynb
notebook in this repo.
Documentation
You can find the documentation here.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for edawesome-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a512cc14ddb913b0a25332497692733d77d4de2f175ae27846d1e935413da01 |
|
MD5 | bfe88a5cc41118d7d56e449f657824b4 |
|
BLAKE2b-256 | 93186f7ebf0bbe6ad2d7707392e145f5256c39935fc1402c6360f7642021d4af |