A Pyspark companion for data science tasks.
Project description
Pyspark DS Toolbox
The objective of the package is to provide a set of tools that helps the daily work of data science with spark. The documentation can be found here and notebooks with usage examples here.
Feel free to contribute :)
Installation
Directly from PyPi:
pip install pyspark-ds-toolbox
or from github, note that installing from github will install the latest development version:
pip install git+https://github.com/viniciusmsousa/pyspark-ds-toolbox.git
Organization
The package organized in a structure based on the nature of the task, such as data wrangling, model/prediction evaluation, and so on.
pyspark_ds_toolbox # Main Package
├─ causal_inference # Sub-package dedicated to Causal Inferece
│ ├─ diff_in_diff.py
│ └─ ps_matching.py
├─ ml # Sub-package dedicated to ML
│ ├─ data_prep # Sub-package to ML data preparation tools
│ │ ├─ class_weights.py
│ │ └─ features_vector.py
│ ├─ classification # Sub-package decidated to classification tasks
│ │ ├─ eval.py
│ │ └─ baseline_classifiers.py
│ ├─ feature_importance # Sub-package with feature importance tools
│ │ ├─ native_spark.py
│ │ └─ shap_values.py
│ └─ feature_selection # Sub-package with feature selection tools
│ └─ information_value.py
├─ wrangling # Sub-package decidated to data wrangling tasks
│ ├─ reshape.py
│ └─ data_quality.py
└─ stats # Sub-package dedicated to basic statistic functionalities
└─ association.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyspark-ds-toolbox-0.4.3.tar.gz.
File metadata
- Download URL: pyspark-ds-toolbox-0.4.3.tar.gz
- Upload date:
- Size: 33.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.7.10 Linux/4.14.281-212.502.amzn2.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
980a5778b70a5604623dd6d9a9244e1f5e10c494ba26bcff2c9d00ec083848be
|
|
| MD5 |
aee8147f60e573ce20e5f7cfc6efb30d
|
|
| BLAKE2b-256 |
2081ded41afc59fa6d9ee827d5b7f6787c179c4fc5b00419278ba2313d6997ad
|
File details
Details for the file pyspark_ds_toolbox-0.4.3-py3-none-any.whl.
File metadata
- Download URL: pyspark_ds_toolbox-0.4.3-py3-none-any.whl
- Upload date:
- Size: 40.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.7.10 Linux/4.14.281-212.502.amzn2.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f864c12ddc8e3904277109f870c80a49af394e627726c070b684bc8840149e97
|
|
| MD5 |
1c5ded34ed7027ced15d70adeebc1bb2
|
|
| BLAKE2b-256 |
6d3f36b824808dd2843c3c122cafb7d9ece5a3b0b242e8005a35302bf345c41c
|