Skip to main content

A Python library for EDA, including visualizations, directory management, data preprocessing, reporting, and more.

Project description

PyPI Downloads License: MIT Zenodo



Welcome to EDA Toolkit, a collection of utility functions designed to streamline your exploratory data analysis (EDA) tasks. This repository offers tools for directory management, some data preprocessing, reporting, visualizations, and more, helping you efficiently handle various aspects of data manipulation and analysis.

Prerequisites

Before you install eda_toolkit, ensure your system meets the following requirements:

  • Python: Version 3.7.4 or higher.

Additionally, eda_toolkit depends on the following packages, which will be automatically installed when you install eda_toolkit:

  • jinja2: version 3.1.4 (Exact version required)
  • matplotlib: version 3.5.3 or higher, but capped at 3.9.2
  • nbformat: version 4.2.0 or higher, but capped at 5.10.4
  • numpy: version 1.21.6 or higher, but capped at 2.1.0
  • pandas: version 1.3.5 or higher, but capped at 2.2.3
  • plotly: version 5.18.0 or higher, but capped at 5.24.0
  • scikit-learn: version 1.0.2 or higher, but capped at 1.5.2
  • seaborn: version 0.12.2 or higher, but capped below 0.13.2
  • tqdm: version 4.66.4 or higher, but capped below 4.67.1
  • xlsxwriter: version 3.2.0 (Exact version required)

💾 Installation

To install eda_toolkit, simply run the following command in your terminal:

pip install eda_toolkit

📄 Official Documentation

https://lshpaner.github.io/eda_toolkit_docs

🌐 Authors' Websites

  1. Leonid Shpaner
  2. Oscar Gil

🙏 Acknowledgements

We would like to express our deepest gratitude to Dr. Ebrahim Tarshizi, our mentor during our time in the University of San Diego M.S. Applied Data Science Program. His unwavering dedication and mentorship played a pivotal role in our academic journey, guiding us to successfully graduate from the program and pursue successful careers as data scientists.

We also extend our thanks to the Shiley-Marcos School of Engineering at the University of San Diego for providing an exceptional learning environment and supporting our educational endeavors.

⚖️ License

eda_toolkit is distributed under the MIT License. See LICENSE for more information.

🛟 Support

If you have any questions or issues with eda_toolkit, please open an issue on this GitHub repository.

📚 Citing eda_toolkit

If you use eda_toolkit in your research or projects, please consider citing it.

@software{shpaner_2024_13162633,
  author       = {Shpaner, Leonid and
                  Gil, Oscar},
  title        = {EDA Toolkit},
  month        = aug,
  year         = 2024,
  publisher    = {Zenodo},
  version      = {0.0.19},
  doi          = {10.5281/zenodo.13162633},
  url          = {https://doi.org/10.5281/zenodo.13162633}
}

🔖 References

  1. Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90-95. https://doi.org/10.1109/MCSE.2007.55

  2. Kohavi, R. (1996). Census Income. UCI Machine Learning Repository. https://doi.org/10.24432/C5GP7S.

  3. Pace, R. Kelley, & Barry, R. (1997). Sparse Spatial Autoregressions. Statistics & Probability Letters, 33(3), 291-297. https://doi.org/10.1016/S0167-7152(96)00140-X.

  4. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. http://jmlr.org/papers/v12/pedregosa11a.html.

  5. Waskom, M. (2021). Seaborn: Statistical Data Visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eda_toolkit-0.0.19.tar.gz (55.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eda_toolkit-0.0.19-py3-none-any.whl (53.2 kB view details)

Uploaded Python 3

File details

Details for the file eda_toolkit-0.0.19.tar.gz.

File metadata

  • Download URL: eda_toolkit-0.0.19.tar.gz
  • Upload date:
  • Size: 55.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for eda_toolkit-0.0.19.tar.gz
Algorithm Hash digest
SHA256 7a1bc516d0a9bbc323bef2298ddf1bea76b23ed1b46ad791b46ec9a496d67613
MD5 ff560e93d342a49d9fd893f818a1b7dd
BLAKE2b-256 b9ba7ef946501198e707d9166d763314f2c74641e927e25553301ff4fa0a9d16

See more details on using hashes here.

File details

Details for the file eda_toolkit-0.0.19-py3-none-any.whl.

File metadata

  • Download URL: eda_toolkit-0.0.19-py3-none-any.whl
  • Upload date:
  • Size: 53.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for eda_toolkit-0.0.19-py3-none-any.whl
Algorithm Hash digest
SHA256 957a1a1b0a32717e54bd2074ca8158723715b07e226491799071fad75a264d5c
MD5 3da255710d2c2f9efe1b9ca3a75612cb
BLAKE2b-256 dceaa77cda7480cdf96fe5f715413da59f1fad14342b07faacbf91a78f1d5f89

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page