Skip to main content

A Pyspark companion for data science tasks.

Project description

Pyspark DS Toolbox

Lifecycle: experimental PyPI Latest Release CodeFactor Codecov test coverage Package Tests

The objective of the package is to provide a set of tools that helps the daily work of data science with spark. The documentation can be found here.

Installation

Directly from PyPi:

pip install pyspark-ds-toolbox

or from github:

pip install git+https://github.com/viniciusmsousa/pyspark-ds-toolbox.git

Organization

The package is currently organized in a structure based on the nature of the task, such as data wrangling, model/prediction evaluation, and so on.

pyspark_ds_toolbox         # Main Package
├─ causal_inference           # Sub-package dedicated to Causal Inferece
│  ├─ diff_in_diff.py   
│  └─ ps_matching.py    
├─ ml                         # Sub-package dedicated to ML
│  ├─ data_prep.py      
│  ├─ classification          # Sub-package decidated to classification tasks
│  │  ├─ eval.py
│  │  └─ baseline_classifiers.py 
│  └─ feature_importance 
│     ├─ native_spark.py
│     └─ shap_values.py    
├─ wrangling.py        
└─ stats                      # Sub-package dedicated to basic statistic functionalities
   └─ association.py    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark-ds-toolbox-0.3.0.tar.gz (29.3 kB view hashes)

Uploaded Source

Built Distribution

pyspark_ds_toolbox-0.3.0-py3-none-any.whl (34.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page