A collection of tools to help when developing PySpark applications
Project description
PySpark Helpers
A collection of tools to help when developing PySpark applications
Installation
With pip
pip install pyspark_helpers
With poetry
poetry add pyspark_helpers
Usage
Auto Generate PySpark Schemas from JSON examples
Through cli:
python -m pyspark_helpers.schema
# OR with script
psh-schema-from-json --path ./tests/data/schema/array.json --output ./results/array_schema.json
Or programatically
from pyspark_helpers.schema import schema_from_json, bulk_schema_from_jsom
from pathlib import Path
data_dir = "data/json"
## One file
schema = schema_from_json(f"{data_dir}/file.json")
print(schema)
## A whole directory
files = [Path(f) for f in Path.glob(f"{data_dir}/*.json")]
schemas = bulk_schema_from_jsom(files)
for _file, schema in zip(files, schemas):
print(_file.name, schema)
Guidelies for Contributing
Use Conventional Commit messages.
To help with this, I encourage you to use commitizen when making your commits. The process for this is straight forward:
# Checkout a new branch
git checkout -b <my-new-branch>
# Make changes to the code....
# Add your changes
git add <changed-file-1> <changed-file-2> ...
# Run commitizen commit and follow prompts
commitizen commit # `cz c` in short
# Push branch
git push
# Open a Pull Request
## This is done in Github
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark_helpers-0.2.1.tar.gz
(5.1 kB
view details)
Built Distribution
File details
Details for the file pyspark_helpers-0.2.1.tar.gz
.
File metadata
- Download URL: pyspark_helpers-0.2.1.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.8.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e510bf3e93c8a49203705af95ad0b2ab7f6236ce2044cbc3b9a6ae84fdc7b05a |
|
MD5 | ffc33ee7948ee336bffa7152d7ec3618 |
|
BLAKE2b-256 | 7ae6bc3ef7719d4db9a1cbdb1a47cd5d69334fa87c02e06dd5f61a875ecb8bf7 |
File details
Details for the file pyspark_helpers-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: pyspark_helpers-0.2.1-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.8.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 340c21e3cdb1441f496bba049163a38d6dee19007e441259109ce9d55b69ab6d |
|
MD5 | 2f0307db1e4eb558ad76c364853bfcff |
|
BLAKE2b-256 | ce51b807c8adfb493833603fe7a107bb852e0ca7f067971911f0953156ae8f88 |