Skip to main content

A collection of tools to help when developing PySpark applications

Project description

PySpark Helpers

A collection of tools to help when developing PySpark applications

Installation

With pip

pip install pyspark_helpers

With poetry

poetry add pyspark_helpers

Usage

Auto Generate PySpark Schemas from JSON examples

Through cli:

python -m pyspark_helpers.schema
# OR with script
psh-schema-from-json --path ./tests/data/schema/array.json --output ./results/array_schema.json

Or programatically

from pyspark_helpers.schema import schema_from_json, bulk_schema_from_jsom
from pathlib import Path

data_dir = "data/json"


## One file
schema = schema_from_json(f"{data_dir}/file.json")

print(schema)

## A whole directory
files = [Path(f) for f in Path.glob(f"{data_dir}/*.json")]
schemas = bulk_schema_from_jsom(files)

for _file, schema in zip(files, schemas):
    print(_file.name, schema)

Guidelies for Contributing

Use Conventional Commit messages.

To help with this, I encourage you to use commitizen when making your commits. The process for this is straight forward:

# Checkout a new branch
git checkout -b <my-new-branch>

# Make changes to the code....

# Add your changes
git add <changed-file-1> <changed-file-2> ...

# Run commitizen commit and follow prompts
commitizen commit # `cz c` in short

# Push branch
git push

# Open a Pull Request
## This is done in Github

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_helpers-0.2.1.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

pyspark_helpers-0.2.1-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file pyspark_helpers-0.2.1.tar.gz.

File metadata

  • Download URL: pyspark_helpers-0.2.1.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.8.10 Darwin/22.3.0

File hashes

Hashes for pyspark_helpers-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e510bf3e93c8a49203705af95ad0b2ab7f6236ce2044cbc3b9a6ae84fdc7b05a
MD5 ffc33ee7948ee336bffa7152d7ec3618
BLAKE2b-256 7ae6bc3ef7719d4db9a1cbdb1a47cd5d69334fa87c02e06dd5f61a875ecb8bf7

See more details on using hashes here.

File details

Details for the file pyspark_helpers-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pyspark_helpers-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.8.10 Darwin/22.3.0

File hashes

Hashes for pyspark_helpers-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 340c21e3cdb1441f496bba049163a38d6dee19007e441259109ce9d55b69ab6d
MD5 2f0307db1e4eb558ad76c364853bfcff
BLAKE2b-256 ce51b807c8adfb493833603fe7a107bb852e0ca7f067971911f0953156ae8f88

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page