A pydantic -> spark schema library
Project description
SparkDantic
1️⃣ version: 0.2.1
✍️ author: Mitchell Lisle
PySpark Model Conversion Tool
This Python module provides a utility for converting Pydantic models to PySpark schemas. It's implemented as a class
named SparkModel
that extends the Pydantic's BaseModel
.
Features
- Conversion from Pydantic model to PySpark schema.
- Determination of nullable types.
- Customizable type mapping between Python and PySpark data types.
Dependencies
This module aims to have a small dependency footprint:
pydantic
pyspark
- Python's built-in
datetime
,decimal
,types
, andtyping
modules
Usage
Creating a new SparkModel
A SparkModel
is a Pydantic model, and you can define one by simply inheriting from SparkModel
and defining some fields:
from sparkdantic import SparkModel
from typing import List
class MyModel(SparkModel):
name: str
age: int
hobbies: List[str]
Generating a PySpark Schema
Pydantic has existing models for generating json schemas (with model_json_schema
). With a SparkModel
you can
generate a PySpark schema from the model fields using the model_spark_schema()
method:
my_model = MyModel()
spark_schema = my_model.model_spark_schema()
Provides this schema:
StructType([
StructField('name', StringType(), True),
StructField('age', IntegerType(), True),
StructField('hobbies', ArrayType(StringType(), False), True)
])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sparkdantic-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | df08587fe21907289e54b7bbac779d61193f172acfe7de21d8d120b8697b994a |
|
MD5 | df486ed5b511fc20bb145645aeacc5ec |
|
BLAKE2b-256 | 9daee38697b9c44985c07be3ead9746fb3512c1a252965bbbad61267714dfa38 |