Skip to main content

A Pyspark companion for data science tasks.

Project description

Pyspark DS Toolbox

Lifecycle: experimental PyPI Latest Release CodeFactor Maintainability Codecov test coverage Package Tests Downloads

The objective of the package is to provide a set of tools that helps the daily work of data science with spark. The documentation can be found here and notebooks with usage examples here.

Feel free to contribute :)

Installation

Directly from PyPi:

pip install pyspark-ds-toolbox

or from github, note that installing from github will install the latest development version:

pip install git+https://github.com/viniciusmsousa/pyspark-ds-toolbox.git

Organization

The package organized in a structure based on the nature of the task, such as data wrangling, model/prediction evaluation, and so on.

pyspark_ds_toolbox         # Main Package
├─ causal_inference           # Sub-package dedicated to Causal Inferece
│  ├─ diff_in_diff.py   
│  └─ ps_matching.py    
├─ ml                         # Sub-package dedicated to ML
│  ├─ data_prep                  # Sub-package to ML data preparation tools
│  │  ├─ class_weights.py     
│  │  └─ features_vector.py 
│  ├─ classification             # Sub-package decidated to classification tasks
│  │  ├─ eval.py
│  │  └─ baseline_classifiers.py 
│  ├─ feature_importance         # Sub-package with feature importance tools
│  │  ├─ native_spark.py
│  │  └─ shap_values.py 
│  └─ feature_selection         # Sub-package with feature selection tools
│     └─ information_value.py    
├─ wrangling                  # Sub-package decidated to data wrangling tasks
│  ├─ reshape.py               
│  └─ data_quality.py         
└─ stats                      # Sub-package dedicated to basic statistic functionalities
   └─ association.py    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark-ds-toolbox-0.4.3.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

pyspark_ds_toolbox-0.4.3-py3-none-any.whl (40.4 kB view details)

Uploaded Python 3

File details

Details for the file pyspark-ds-toolbox-0.4.3.tar.gz.

File metadata

  • Download URL: pyspark-ds-toolbox-0.4.3.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.7.10 Linux/4.14.281-212.502.amzn2.x86_64

File hashes

Hashes for pyspark-ds-toolbox-0.4.3.tar.gz
Algorithm Hash digest
SHA256 980a5778b70a5604623dd6d9a9244e1f5e10c494ba26bcff2c9d00ec083848be
MD5 aee8147f60e573ce20e5f7cfc6efb30d
BLAKE2b-256 2081ded41afc59fa6d9ee827d5b7f6787c179c4fc5b00419278ba2313d6997ad

See more details on using hashes here.

File details

Details for the file pyspark_ds_toolbox-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: pyspark_ds_toolbox-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 40.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.7.10 Linux/4.14.281-212.502.amzn2.x86_64

File hashes

Hashes for pyspark_ds_toolbox-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f864c12ddc8e3904277109f870c80a49af394e627726c070b684bc8840149e97
MD5 1c5ded34ed7027ced15d70adeebc1bb2
BLAKE2b-256 6d3f36b824808dd2843c3c122cafb7d9ece5a3b0b242e8005a35302bf345c41c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page