A collection of tools to help when developing PySpark applications
Project description
PySpark Helpers
A collection of tools to help when developing PySpark applications
Installation
With pip
pip install pyspark_helpers
With poetry
poetry add pyspark_helpers
Usage
Auto Generate PySpark Schemas from JSON examples
Through cli:
python -m pyspark_helpers.schema
# OR with script
psh-schema-from-json --path ./tests/data/schema/array.json --output ./results/array_schema.json
Or programatically
from pyspark_helpers.schema import schema_from_json, bulk_schema_from_jsom
from pathlib import Path
data_dir = "data/json"
## One file
schema = schema_from_json(f"{data_dir}/file.json")
print(schema)
## A whole directory
files = [Path(f) for f in Path.glob(f"{data_dir}/*.json")]
schemas = bulk_schema_from_jsom(files)
for _file, schema in zip(files, schemas):
print(_file.name, schema)
Guidelies for Contributing
Use Conventional Commit messages.
To help with this, I encourage you to use commitizen when making your commits. The process for this is straight forward:
# Checkout a new branch
git checkout -b <my-new-branch>
# Make changes to the code....
# Add your changes
git add <changed-file-1> <changed-file-2> ...
# Run commitizen commit and follow prompts
commitizen commit # `cz c` in short
# Push branch
git push
# Open a Pull Request
## This is done in Github
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyspark_helpers-0.2.1.tar.gz.
File metadata
- Download URL: pyspark_helpers-0.2.1.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.8.10 Darwin/22.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e510bf3e93c8a49203705af95ad0b2ab7f6236ce2044cbc3b9a6ae84fdc7b05a
|
|
| MD5 |
ffc33ee7948ee336bffa7152d7ec3618
|
|
| BLAKE2b-256 |
7ae6bc3ef7719d4db9a1cbdb1a47cd5d69334fa87c02e06dd5f61a875ecb8bf7
|
File details
Details for the file pyspark_helpers-0.2.1-py3-none-any.whl.
File metadata
- Download URL: pyspark_helpers-0.2.1-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.8.10 Darwin/22.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
340c21e3cdb1441f496bba049163a38d6dee19007e441259109ce9d55b69ab6d
|
|
| MD5 |
2f0307db1e4eb558ad76c364853bfcff
|
|
| BLAKE2b-256 |
ce51b807c8adfb493833603fe7a107bb852e0ca7f067971911f0953156ae8f88
|