Skip to main content

A collection of tools to help when developing PySpark applications

Project description

PySpark Helpers

A collection of tools to help when developing PySpark applications

Installation

With pip

pip install pyspark_helpers

With poetry

poetry add pyspark_helpers

Usage

Auto Generate PySpark Schemas from JSON examples

Through cli:

python -m pyspark_helpers.schema
# OR with script
psh-schema-from-json --path ./tests/data/schema/array.json --output ./results/array_schema.json

Or programatically

from pyspark_helpers.schema import schema_from_json, bulk_schema_from_jsom
from pathlib import Path

data_dir = "data/json"


## One file
schema = schema_from_json(f"{data_dir}/file.json")

print(schema)

## A whole directory
files = [Path(f) for f in Path.glob(f"{data_dir}/*.json")]
schemas = bulk_schema_from_jsom(files)

for _file, schema in zip(files, schemas):
    print(_file.name, schema)

Guidelies for Contributing

Use Conventional Commit messages.

To help with this, I encourage you to use commitizen when making your commits. The process for this is straight forward:

# Checkout a new branch
git checkout -b <my-new-branch>

# Make changes to the code....

# Add your changes
git add <changed-file-1> <changed-file-2> ...

# Run commitizen commit and follow prompts
commitizen commit # `cz c` in short

# Push branch
git push

# Open a Pull Request
## This is done in Github

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_helpers-0.2.1.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

pyspark_helpers-0.2.1-py3-none-any.whl (5.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page