A Pyspark companion for data science tasks.
Project description
Pyspark DS Toolbox
The objective of the package is to provide a set of tools that helps the daily work of data science with spark. The documentation can be found here.
Installation
Directly from PyPi:
pip install pyspark-ds-toolbox
or from github:
pip install git+https://github.com/viniciusmsousa/pyspark-ds-toolbox.git
Organization
The package is currently organized in a structure based on the nature of the task, such as data wrangling, model/prediction evaluation, and so on.
pyspark_ds_toolbox # Main Package
├─ causal_inference # Sub-package dedicated to Causal Inferece
│ ├─ diff_in_diff.py
│ └─ ps_matching.py
├─ ml # Sub-package dedicated to ML
│ ├─ data_prep.py
│ ├─ classification # Sub-package decidated to classification tasks
│ │ ├─ eval.py
│ │ └─ baseline_classifiers.py
│ └─ feature_importance
│ ├─ native_spark.py
│ └─ shap_values.py
├─ wrangling.py
└─ stats # Sub-package dedicated to basic statistic functionalities
└─ association.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark-ds-toolbox-0.3.1.tar.gz
(29.5 kB
view hashes)
Built Distribution
Close
Hashes for pyspark_ds_toolbox-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed36e2cc52832dd3b6ef8b50c6ea78afdf93b71134f9e4813e8db8d0cc806847 |
|
MD5 | e53fc9910b837484a30202b01a08716e |
|
BLAKE2b-256 | eaa04bef09c00d5f46c922524af0f22f8d89daedccc71a0d8f20c43b270807aa |