Skip to main content

A Pyspark companion for data science tasks.

Project description

Pyspark DS Toolbox

Lifecycle: experimental PyPI Latest Release CodeFactor Codecov test coverage Package Tests

The objective of the package is to provide a set of tools that helps the daily work of data science with spark. The documentation can be found here.

Installation

Directly from PyPi:

pip install pyspark-ds-toolbox

or from github:

pip install git+https://github.com/viniciusmsousa/pyspark-ds-toolbox.git

Organization

The package is currently organized in a structure based on the nature of the task, such as data wrangling, model/prediction evaluation, and so on.

pyspark_ds_toolbox     # Main Package
├─ causal_inference    # Sub-package dedicated to Causal Inferece
│  ├─ diff_in_diff.py   # Module Diff in Diff
│  └─ ps_matching.py    # Module Propensity Score Matching
├─ ml                  # Sub-package dedicated to ML
│  ├─ data_prep.py      # Module for Data Preparation
│  ├─ eval.py           # Module for model/prediction evaluation
│  └─ shap_values.py    # Module for estimate shap values
├─ wrangling.py        # Module for general Data Wrangling
└─ stats               # Sub-package dedicated to basic statistic functionalities
   └─ association.py    # Association metrics module

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark-ds-toolbox-0.1.3.tar.gz (26.5 kB view hashes)

Uploaded Source

Built Distribution

pyspark_ds_toolbox-0.1.3-py3-none-any.whl (29.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page