Skip to main content

A collection of tools to help when developing PySpark applications

Project description

PySpark Helpers

A collection of tools to help when developing PySpark applications

Installation

With pip

pip install pyspark_helpers

With poetry

poetry add pyspark_helpers

Usage

Auto Generate PySpark Schemas from JSON examples

Through cli:

python -m pyspark_helpers.schema
# OR with script
psh-schema-from-json --path ./tests/data/schema/array.json --output ./results/array_schema.json

Or programatically

from pyspark_helpers.schema import schema_from_json, bulk_schema_from_jsom
from pathlib import Path

data_dir = "data/json"


## One file
schema = schema_from_json(f"{data_dir}/file.json")

print(schema)

## A whole directory
files = [Path(f) for f in Path.glob(f"{data_dir}/*.json")]
schemas = bulk_schema_from_jsom(files)

for _file, schema in zip(files, schemas):
    print(_file.name, schema)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_helpers-0.1.2.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

pyspark_helpers-0.1.2-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file pyspark_helpers-0.1.2.tar.gz.

File metadata

  • Download URL: pyspark_helpers-0.1.2.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.10 Darwin/22.3.0

File hashes

Hashes for pyspark_helpers-0.1.2.tar.gz
Algorithm Hash digest
SHA256 708f5a3d23a0aad5e1e02df07fb2de1cdb8a87e805bb8cac93e257faa9703980
MD5 51a3cd1ad622fc82ad581053fa5d1538
BLAKE2b-256 9a0075e32f1c095d24d5568f4f6b18d5fd04cb239db3164102890dc1ece138c1

See more details on using hashes here.

Provenance

File details

Details for the file pyspark_helpers-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pyspark_helpers-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.10 Darwin/22.3.0

File hashes

Hashes for pyspark_helpers-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 202715152eff87cab88779f43afeb5a6e9e144daec60686b674edd21949910b5
MD5 ac77055d5eb6c13748702882c4b24a0d
BLAKE2b-256 391f9f8192e9e66734e50024f10748738e20ce62a053892c0c1ef16b0776cc85

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page