A collection of tools to help when developing PySpark applications
Project description
PySpark Helpers
A collection of tools to help when developing PySpark applications
Installation
With pip
pip install pyspark_helpers
With poetry
poetry add pyspark_helpers
Usage
Auto Generate PySpark Schemas from JSON examples
Through cli:
python -m pyspark_helpers.schema
# OR with script
psh-schema-from-json --path ./tests/data/schema/array.json --output ./results/array_schema.json
Or programatically
from pyspark_helpers.schema import schema_from_json, bulk_schema_from_jsom
from pathlib import Path
data_dir = "data/json"
## One file
schema = schema_from_json(f"{data_dir}/file.json")
print(schema)
## A whole directory
files = [Path(f) for f in Path.glob(f"{data_dir}/*.json")]
schemas = bulk_schema_from_jsom(files)
for _file, schema in zip(files, schemas):
print(_file.name, schema)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspark_helpers-0.1.3.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for pyspark_helpers-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d7cbb12a938a8c4f806df0e9ebff527e88d8bf1fa3b7cbf18733e7ceb43da55 |
|
MD5 | c3fa3b1802099a4e8faf578b021c45bb |
|
BLAKE2b-256 | 707d99d77c1b03c8057965c0380ea68b1663a7ff328fca8a0cd1244c9e3a511d |