A collection of tools to help when developing PySpark applications
Project description
PySpark Helpers
A collection of tools to help when developing PySpark applications
Installation
With pip
pip install pyspark_helpers
With poetry
poetry add pyspark_helpers
Usage
Auto Generate PySpark Schemas from JSON examples
Through cli:
python -m pyspark_helpers.schema
# OR with script
psh-schema-from-json --path ./tests/data/schema/array.json --output ./results/array_schema.json
Or programatically
from pyspark_helpers.schema import schema_from_json, bulk_schema_from_jsom
from pathlib import Path
data_dir = "data/json"
## One file
schema = schema_from_json(f"{data_dir}/file.json")
print(schema)
## A whole directory
files = [Path(f) for f in Path.glob(f"{data_dir}/*.json")]
schemas = bulk_schema_from_jsom(files)
for _file, schema in zip(files, schemas):
print(_file.name, schema)
Guidelies for Contributing
Use Conventional Commit messages.
To help with this, I encourage you to use commitizen when making your commits. The process for this is straight forward:
# Checkout a new branch
git checkout -b <my-new-branch>
# Make changes to the code....
# Add your changes
git add <changed-file-1> <changed-file-2> ...
# Run commitizen commit and follow prompts
commitizen commit # `cz c` in short
# Push branch
git push
# Open a Pull Request
## This is done in Github
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark_helpers-0.2.0.tar.gz
(5.1 kB
view details)
Built Distribution
File details
Details for the file pyspark_helpers-0.2.0.tar.gz
.
File metadata
- Download URL: pyspark_helpers-0.2.0.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.8.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fee261f8b37b3876486b3fea8860bc6505b0a3bd7f6270819bf05349eff4173a |
|
MD5 | 81ab6cdcfeef69712f06b25333a4d27b |
|
BLAKE2b-256 | cfa32405e07934b6b0813d1f47a10bb91d61204173bf85b9b28ff16af428d424 |
Provenance
File details
Details for the file pyspark_helpers-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: pyspark_helpers-0.2.0-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.8.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b21da74b2ac6bf9dcb0ae8ea80f9b8747122604cd41faa14b926c6ce47ec8820 |
|
MD5 | 42979f5b7cdb2447eff1a478c1ebe7ae |
|
BLAKE2b-256 | a7e81e37b2fe226c78acf247b3da5346a61bd132705e0b3c66a9f0b85d10988d |