A pydantic -> spark schema library
Project description
SparkDantic
1️⃣ version: 0.1.0
✍️ author: Mitchell Lisle
PySpark Model Conversion Tool
This Python module provides a utility for converting Pydantic models to PySpark schemas. It's implemented as a class
named SparkModel
that extends the Pydantic's BaseModel
.
Features
- Conversion from Pydantic model to PySpark schema.
- Determination of nullable types.
- Customizable type mapping between Python and PySpark data types.
Dependencies
This module aims to have a small dependency footprint:
pydantic
pyspark
- Python's built-in
datetime
,decimal
,types
, andtyping
modules
Usage
Creating a new SparkModel
A SparkModel
is a Pydantic model, and you can define one by simply inheriting from SparkModel
and defining some fields:
from sparkdantic import SparkModel
from typing import List
class MyModel(SparkModel):
name: str
age: int
hobbies: List[str]
Generating a PySpark Schema
Pydantic has existing models for generating json schemas (with model_json_schema
). With a SparkModel
you can
generate a PySpark schema from the model fields using the model_spark_schema()
method:
my_model = MyModel()
spark_schema = my_model.model_spark_schema()
Provides this schema:
StructType([
StructField('name', StringType(), True),
StructField('age', IntegerType(), True),
StructField('hobbies', ArrayType(StringType(), False), True)
])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sparkdantic-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1eb61b1fc3a1ab48d22e5b1efde83f101c70d957fd17f4b8f8990cdb3b0e937 |
|
MD5 | fca064bb06356241760b59af35b88367 |
|
BLAKE2b-256 | 0b4766e3384482f84caa99ec45630d7b63c7409bfcc21b1c0a9ff28347e8e775 |