Skip to main content

A collection of tools to help when developing PySpark applications

Project description

PySpark Helpers

A collection of tools to help when developing PySpark applications

Installation

With pip

pip install pyspark_helpers

With poetry

poetry add pyspark_helpers

Usage

Auto Generate PySpark Schemas from JSON examples

Through cli:

python -m pyspark_helpers.schema
# OR with script
psh-schema-from-json --path ./tests/data/schema/array.json --output ./results/array_schema.json

Or programatically

from pyspark_helpers.schema import schema_from_json, bulk_schema_from_jsom
from pathlib import Path

data_dir = "data/json"


## One file
schema = schema_from_json(f"{data_dir}/file.json")

print(schema)

## A whole directory
files = [Path(f) for f in Path.glob(f"{data_dir}/*.json")]
schemas = bulk_schema_from_jsom(files)

for _file, schema in zip(files, schemas):
    print(_file.name, schema)

Guidelies for Contributing

Use Conventional Commit messages.

To help with this, I encourage you to use commitizen when making your commits. The process for this is straight forward:

# Checkout a new branch
git checkout -b <my-new-branch>

# Make changes to the code....

# Add your changes
git add <changed-file-1> <changed-file-2> ...

# Run commitizen commit and follow prompts
commitizen commit # `cz c` in short

# Push branch
git push

# Open a Pull Request
## This is done in Github

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_helpers-0.2.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

pyspark_helpers-0.2.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file pyspark_helpers-0.2.0.tar.gz.

File metadata

  • Download URL: pyspark_helpers-0.2.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.8.10 Darwin/22.3.0

File hashes

Hashes for pyspark_helpers-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fee261f8b37b3876486b3fea8860bc6505b0a3bd7f6270819bf05349eff4173a
MD5 81ab6cdcfeef69712f06b25333a4d27b
BLAKE2b-256 cfa32405e07934b6b0813d1f47a10bb91d61204173bf85b9b28ff16af428d424

See more details on using hashes here.

Provenance

File details

Details for the file pyspark_helpers-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pyspark_helpers-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.8.10 Darwin/22.3.0

File hashes

Hashes for pyspark_helpers-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b21da74b2ac6bf9dcb0ae8ea80f9b8747122604cd41faa14b926c6ce47ec8820
MD5 42979f5b7cdb2447eff1a478c1ebe7ae
BLAKE2b-256 a7e81e37b2fe226c78acf247b3da5346a61bd132705e0b3c66a9f0b85d10988d

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page