Skip to main content

A Python library for EDA, including visualizations, directory management, data preprocessing, reporting, and more.

Project description

PyPI Downloads License: MIT Zenodo



Welcome to EDA Toolkit, a collection of utility functions designed to streamline your exploratory data analysis (EDA) tasks. This repository offers tools for directory management, some data preprocessing, reporting, visualizations, and more, helping you efficiently handle various aspects of data manipulation and analysis.

Prerequisites

Before you install eda_toolkit, ensure your system meets the following requirements:

  • Python: Version 3.7.4 or higher.

Additionally, eda_toolkit depends on the following packages, which will be automatically installed when you install eda_toolkit:

  • jinja2: version 3.1.4 (Exact version required)
  • matplotlib: version 3.5.3 or higher, but capped at 3.9.2
  • nbformat: version 4.2.0 or higher, but capped at 5.10.4
  • numpy: version 1.21.6 or higher, but capped at 2.1.0
  • pandas: version 1.3.5 or higher, but capped at 2.2.3
  • plotly: version 5.18.0 or higher, but capped at 5.24.0
  • scikit-learn: version 1.0.2 or higher, but capped at 1.5.2
  • scipy: version 1.5.4 or higher, but capped at 1.7.3
  • seaborn: version 0.12.2 or higher, but capped below 0.13.2
  • tqdm: version 4.66.4 or higher, but capped below 4.67.1
  • xlsxwriter: version 3.2.0 (Exact version required)

💾 Installation

To install eda_toolkit, simply run the following command in your terminal:

pip install eda_toolkit

📄 Official Documentation

https://lshpaner.github.io/eda_toolkit_docs

🌐 Authors' Websites

  1. Leonid Shpaner
  2. Oscar Gil

🙏 Acknowledgements

We would like to express our deepest gratitude to Dr. Ebrahim Tarshizi of the Shiley-Marcos School of Engineering at the University of San Diego for his mentorship in the M.S. in Applied Data Science Program. His unwavering dedication and guidance played a pivotal role in our academic journey, supporting our successful completion of the program and our pursuit of careers as data scientists.

We thank Robert Lanzafame, PhD, for his feedback, encouragement, and thoughtful discussion following our presentation at JupyterCon, and Panayiotis Petousis, PhD, and Arthur Funnell from the CTSI UCLA Health data science team for their helpful comments, constructive feedback, and continued encouragement throughout the development of this library.

Finally, Leon Shpaner would like to personally acknowledge his mentor, former manager, and friend, Gustavo Prado, who hired him at the Los Angeles Film School. Gustavo believed in him early on, gave him the opportunity to grow, and was patient as he developed professionally. He saw potential before it was fully formed and sparked an early interest in data by demonstrating the importance of tools like VLOOKUP. His guidance and trust had a lasting impact. May he rest in peace.

⚖️ License

eda_toolkit is distributed under the MIT License. See LICENSE for more information.

🛟 Support

If you have any questions or issues with eda_toolkit, please open an issue on this GitHub repository.

📚 Citing eda_toolkit

If you use eda_toolkit in your research or projects, please consider citing it.

@software{shpaner_2024_13162633,
  author       = {Shpaner, Leonid and
                  Gil, Oscar},
  title        = {EDA Toolkit},
  month        = aug,
  year         = 2024,
  publisher    = {Zenodo},
  version      = {0.0.20},
  doi          = {10.5281/zenodo.13162633},
  url          = {https://doi.org/10.5281/zenodo.13162633}
}

🔖 References

  1. Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90-95. https://doi.org/10.1109/MCSE.2007.55

  2. Kohavi, R. (1996). Census Income. UCI Machine Learning Repository. https://doi.org/10.24432/C5GP7S.

  3. Pace, R. Kelley, & Barry, R. (1997). Sparse Spatial Autoregressions. Statistics & Probability Letters, 33(3), 291-297. https://doi.org/10.1016/S0167-7152(96)00140-X.

  4. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. http://jmlr.org/papers/v12/pedregosa11a.html.

  5. Waskom, M. (2021). Seaborn: Statistical Data Visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eda_toolkit-0.0.20.tar.gz (68.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eda_toolkit-0.0.20-py3-none-any.whl (67.0 kB view details)

Uploaded Python 3

File details

Details for the file eda_toolkit-0.0.20.tar.gz.

File metadata

  • Download URL: eda_toolkit-0.0.20.tar.gz
  • Upload date:
  • Size: 68.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for eda_toolkit-0.0.20.tar.gz
Algorithm Hash digest
SHA256 5363c57588efd6706172474525a306912db608f9ca25f85af577f47ed6ca2761
MD5 60648ea9969ff288cf85a6bac327a6cf
BLAKE2b-256 8036c4d2329da3565bd521c199c8471d76ee57f5a3efbe885946a537bbb35a62

See more details on using hashes here.

File details

Details for the file eda_toolkit-0.0.20-py3-none-any.whl.

File metadata

  • Download URL: eda_toolkit-0.0.20-py3-none-any.whl
  • Upload date:
  • Size: 67.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for eda_toolkit-0.0.20-py3-none-any.whl
Algorithm Hash digest
SHA256 9fae112f6f33d3c82375d5532ba514616708fd37d063910df923d4a14e04ec99
MD5 2a212a59212a765526d7bc41d7a6a906
BLAKE2b-256 7212cc204f8e3d8c052d1cf48dd858a333ac22fdc1698e675ab963d38ba532d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page