A collection of tools to help when developing PySpark applications
Project description
PySpark Helpers
A collection of tools to help when developing PySpark applications
Installation
With pip
pip install pyspark_helpers
With poetry
poetry add pyspark_helpers
Usage
Auto Generate PySpark Schemas from JSON examples
Through cli:
python -m pyspark_helpers.schema
# OR with script
psh-schema-from-json --path ./tests/data/schema/array.json --output ./results/array_schema.json
Or programatically
from pyspark_helpers.schema import schema_from_json, bulk_schema_from_jsom
from pathlib import Path
data_dir = "data/json"
## One file
schema = schema_from_json(f"{data_dir}/file.json")
print(schema)
## A whole directory
files = [Path(f) for f in Path.glob(f"{data_dir}/*.json")]
schemas = bulk_schema_from_jsom(files)
for _file, schema in zip(files, schemas):
print(_file.name, schema)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark_helpers-0.1.2.tar.gz
(5.0 kB
view details)
Built Distribution
File details
Details for the file pyspark_helpers-0.1.2.tar.gz
.
File metadata
- Download URL: pyspark_helpers-0.1.2.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.8.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 708f5a3d23a0aad5e1e02df07fb2de1cdb8a87e805bb8cac93e257faa9703980 |
|
MD5 | 51a3cd1ad622fc82ad581053fa5d1538 |
|
BLAKE2b-256 | 9a0075e32f1c095d24d5568f4f6b18d5fd04cb239db3164102890dc1ece138c1 |
Provenance
File details
Details for the file pyspark_helpers-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: pyspark_helpers-0.1.2-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.8.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 202715152eff87cab88779f43afeb5a6e9e144daec60686b674edd21949910b5 |
|
MD5 | ac77055d5eb6c13748702882c4b24a0d |
|
BLAKE2b-256 | 391f9f8192e9e66734e50024f10748738e20ce62a053892c0c1ef16b0776cc85 |