A collection of tools to help when developing PySpark applications
Project description
PySpark Helpers
A collection of tools to help when developing PySpark applications
Installation
With pip
pip install pyspark_helpers
With poetry
poetry add pyspark_helpers
Usage
Auto Generate PySpark Schemas from JSON examples
Through cli:
python -m pyspark_helpers.schema
# OR with script
psh-schema-from-json --path ./tests/data/schema/array.json --output ./results/array_schema.json
Or programatically
from pyspark_helpers.schema import schema_from_json, bulk_schema_from_jsom
from pathlib import Path
data_dir = "data/json"
## One file
schema = schema_from_json(f"{data_dir}/file.json")
print(schema)
## A whole directory
files = [Path(f) for f in Path.glob(f"{data_dir}/*.json")]
schemas = bulk_schema_from_jsom(files)
for _file, schema in zip(files, schemas):
print(_file.name, schema)
Guidelies for Contributing
Use Conventional Commit messages.
To help with this, I encourage you to use commitizen when making your commits. The process for this is straight forward:
# Checkout a new branch
git checkout -b <my-new-branch>
# Make changes to the code....
# Add your changes
git add <changed-file-1> <changed-file-2> ...
# Run commitizen commit and follow prompts
commitizen commit # `cz c` in short
# Push branch
git push
# Open a Pull Request
## This is done in Github
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark_helpers-0.1.4.tar.gz
(5.1 kB
view hashes)
Built Distribution
Close
Hashes for pyspark_helpers-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5dfc0c19bd4b359149e8c2b83fb11c73f05411e8417184baa00ca2d4523c7cd8 |
|
MD5 | 58e8ad42434a64c19097ff23ef293f11 |
|
BLAKE2b-256 | 3f5a498ba9b5dfc237637519a3c7ce5913855ae65af86becd04556cbd563a786 |