A Pyspark companion for data science tasks.
Project description
Pyspark DS Toolbox
The objective of the package is to provide a set of tools that helps the daily work of data science with spark. The documentation can be found here and notebooks with usage examples here.
Feel free to contribute :)
Installation
Directly from PyPi:
pip install pyspark-ds-toolbox
or from github, note that installing from github will install the latest development version:
pip install git+https://github.com/viniciusmsousa/pyspark-ds-toolbox.git
Organization
The package organized in a structure based on the nature of the task, such as data wrangling, model/prediction evaluation, and so on.
pyspark_ds_toolbox # Main Package
├─ causal_inference # Sub-package dedicated to Causal Inferece
│ ├─ diff_in_diff.py
│ └─ ps_matching.py
├─ ml # Sub-package dedicated to ML
│ ├─ data_prep # Sub-package to ML data preparation tools
│ │ ├─ class_weights.py
│ │ └─ features_vector.py
│ ├─ classification # Sub-package decidated to classification tasks
│ │ ├─ eval.py
│ │ └─ baseline_classifiers.py
│ ├─ feature_importance # Sub-package with feature importance tools
│ │ ├─ native_spark.py
│ │ └─ shap_values.py
│ └─ feature_selection # Sub-package with feature selection tools
│ └─ information_value.py
├─ wrangling # Sub-package decidated to data wrangling tasks
│ ├─ reshape.py
│ └─ data_quality.py
└─ stats # Sub-package dedicated to basic statistic functionalities
└─ association.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark-ds-toolbox-0.4.3.tar.gz
(33.6 kB
view details)
Built Distribution
File details
Details for the file pyspark-ds-toolbox-0.4.3.tar.gz
.
File metadata
- Download URL: pyspark-ds-toolbox-0.4.3.tar.gz
- Upload date:
- Size: 33.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.7.10 Linux/4.14.281-212.502.amzn2.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 980a5778b70a5604623dd6d9a9244e1f5e10c494ba26bcff2c9d00ec083848be |
|
MD5 | aee8147f60e573ce20e5f7cfc6efb30d |
|
BLAKE2b-256 | 2081ded41afc59fa6d9ee827d5b7f6787c179c4fc5b00419278ba2313d6997ad |
File details
Details for the file pyspark_ds_toolbox-0.4.3-py3-none-any.whl
.
File metadata
- Download URL: pyspark_ds_toolbox-0.4.3-py3-none-any.whl
- Upload date:
- Size: 40.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.7.10 Linux/4.14.281-212.502.amzn2.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f864c12ddc8e3904277109f870c80a49af394e627726c070b684bc8840149e97 |
|
MD5 | 1c5ded34ed7027ced15d70adeebc1bb2 |
|
BLAKE2b-256 | 6d3f36b824808dd2843c3c122cafb7d9ece5a3b0b242e8005a35302bf345c41c |