Skip to main content

Pydantic BaseModel extension that emits PySpark schemas

Project description

pydantic-pyspark

A tiny package that extends Pydantic v2's BaseModel with a pyspark_schema() classmethod. Define your data contract once with Pydantic and get a matching PySpark StructType for free.

Install

pip install pydantic-pyspark

Usage

from typing import Optional
from pydantic_pyspark import BaseModel

class Address(SparkModel):
    street: str
    zip_code: str

class User(SparkModel):
    id: int
    name: str
    email: Optional[str] = None
    tags: list[str] = []
    address: Address

print(User.pyspark_schema())
# StructType([
#     StructField('id', LongType(), False),
#     StructField('name', StringType(), False),
#     StructField('email', StringType(), True),
#     StructField('tags', ArrayType(StringType(), False), False),
#     StructField('address', StructType([...]), False),
# ])

Type mapping

Python / Pydantic PySpark
str, uuid.UUID StringType
int LongType
float DoubleType
bool BooleanType
bytes BinaryType
datetime.datetime TimestampType
datetime.date DateType
datetime.timedelta DayTimeIntervalType
decimal.Decimal DecimalType(38, 18)
list[T] / set[T] ArrayType(T)
dict[K, V] MapType(K, V)
nested SparkModel / BaseModel StructType
Optional[T] / T | None T with nullable=True

Unions other than Optional[T] are not supported. Spark has no sum types.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_pyspark-0.1.0.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydantic_pyspark-0.1.0-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file pydantic_pyspark-0.1.0.tar.gz.

File metadata

  • Download URL: pydantic_pyspark-0.1.0.tar.gz
  • Upload date:
  • Size: 2.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for pydantic_pyspark-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a85bb0e1e3117bfb5d8589c817d2ad5cc935d8c9e5e1ac93903179cf56c16db3
MD5 dd3ebb31eabc4831c8d7391d3d923c79
BLAKE2b-256 f8a001e3329a84c2f2219a73a17d2819579935e2e3689305528e5d530c0963b6

See more details on using hashes here.

File details

Details for the file pydantic_pyspark-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pydantic_pyspark-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5a26adb10d9e72037fa0db254c76a89aece422379f2c3f2944bd99201d84095
MD5 9683b01fa075be74203a9e51d9cff093
BLAKE2b-256 233b10bd110dd43ccb44cee493fc79de396244fe54888a4e51f105417f705d90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page