Convert between Polars schemas and PySpark schemas

These details have not been verified by PyPI

Project description

Charmander

Cross-platform Handling of Array, Recursive, Mapping, And Nested Data Exchange Runtime

Convert between Polars schemas and PySpark schemas with ease.

Charmander provides simple, bidirectional conversion functions to transform schemas between Polars and PySpark, supporting all complex types including nested structures, arrays, and maps.

Installation

pip install charmander

Requirements

Python >= 3.8
polars >= 0.19.0
pyspark >= 3.0.0

Quick Start

Converting Polars Schema to PySpark

Charmander supports three Polars schema formats - use whichever is most convenient:

import polars as pl
from charmander import to_pyspark_schema

# Format 1: Dictionary
polars_schema_dict = {
    "name": pl.String,
    "age": pl.Int32,
    "score": pl.Float64,
    "tags": pl.List(pl.String),
}

# Format 2: pl.Schema object
polars_schema_schema = pl.Schema({
    "name": pl.String,
    "age": pl.Int32,
    "score": pl.Float64,
    "tags": pl.List(pl.String),
})

# Format 3: List of tuples
polars_schema_list = [
    ("name", pl.String),
    ("age", pl.Int32),
    ("score", pl.Float64),
    ("tags", pl.List(pl.String)),
]

# All three formats work identically!
pyspark_schema = to_pyspark_schema(polars_schema_dict)
# or: to_pyspark_schema(polars_schema_schema)
# or: to_pyspark_schema(polars_schema_list)

print(pyspark_schema)
# StructType([StructField('name', StringType(), True),
#             StructField('age', IntegerType(), True),
#             StructField('score', DoubleType(), True),
#             StructField('tags', ArrayType(StringType(), True), True)])

Converting PySpark Schema to Polars

from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, ArrayType
from charmander import to_polars_schema

# Define a PySpark schema
pyspark_schema = StructType([
    StructField("name", StringType()),
    StructField("age", IntegerType()),
    StructField("score", DoubleType()),
    StructField("tags", ArrayType(StringType())),
])

# Convert to Polars schema
polars_schema = to_polars_schema(pyspark_schema)
print(polars_schema)
# Schema({'name': <class 'polars.datatypes.String'>, 'age': <class 'polars.datatypes.Int32'>, ...})

# Use directly with Polars DataFrame
df = pl.DataFrame({}, schema=polars_schema)

Features

Bidirectional Conversion: Convert schemas in both directions (Polars ↔ PySpark)
Multiple Schema Formats: Supports pl.Schema, dict[str, pl.DataType], and Iterable[tuple[str, pl.DataType]] formats
Native Polars Integration: Returns pl.Schema objects from to_polars_schema for seamless DataFrame integration
Comprehensive Type Support: Supports all primitive and complex types
Nested Structures: Handles deeply nested structs, arrays, and maps
Type Safety: Clear error messages for unsupported types
Simple API: Functional, stateless functions - easy to use and understand

Supported Types

Primitive Types

Polars	PySpark
`Int8`	`ByteType`
`Int16`	`ShortType`
`Int32`	`IntegerType`
`Int64`	`LongType`
`UInt8`	`ShortType`
`UInt16`	`IntegerType`
`UInt32`	`LongType`
`Float32`	`FloatType`
`Float64`	`DoubleType`
`Boolean`	`BooleanType`
`String` / `Utf8`	`StringType`
`Date`	`DateType`
`Datetime`	`TimestampType`
`Decimal`	`DecimalType`
`Binary`	`BinaryType`
`Null`	`NullType`
`Categorical`	`StringType`
`Enum`	`StringType`
`Int128`	`DecimalType`

PySpark Types:

PySpark	Polars
`ByteType`	`Int8`
`ShortType`	`Int32`
`IntegerType`	`Int32`
`LongType`	`Int64`
`FloatType`	`Float32`
`DoubleType`	`Float64`
`BooleanType`	`Boolean`
`StringType`	`String`
`VarcharType`	`String`
`CharType`	`String`
`DateType`	`Date`
`TimestampType`	`Datetime`
`TimestampNTZType`	`Datetime`
`DecimalType`	`Decimal`
`BinaryType`	`Binary`
`NullType`	`Null`

Complex Types

Arrays/Lists: Fully supported with nested arrays
Structs: Fully supported with nested structs
Maps: PySpark MapType converts to Polars Struct (with key and value fields)

Limitations

Type Conversions with Information Loss

Some type conversions result in information loss or semantic changes:

UInt64 → LongType: PySpark doesn't support unsigned 64-bit integers, so UInt64 maps to signed LongType. Values greater than 2^63 - 1 may cause issues.
Duration → StringType: Polars Duration types are converted to PySpark StringType as PySpark doesn't have a native duration type. The semantic meaning is lost.
Time → TimestampType: Polars Time types are converted to PySpark TimestampType, which may not be the ideal representation.
Decimal precision/scale: When converting Polars Decimal to PySpark DecimalType, default precision (10) and scale (0) are used. Precision and scale information is not preserved when converting from PySpark to Polars.
MapType → Struct: PySpark MapType is converted to a Polars Struct with key and value fields. This changes the data structure from a map to a struct representation.

Nullability

Polars → PySpark: All fields are created with nullable=True, as Polars schemas don't explicitly track nullability at the schema definition level.
PySpark → Polars: The nullable attribute from PySpark StructField is not preserved, as Polars schemas don't track nullability per field. All Polars fields can contain nulls by default.

Input Validation

Charmander validates schemas before conversion:

Duplicate field names: Raises SchemaError if duplicate field names are detected
Empty field names: Raises SchemaError if any field name is an empty string
Invalid field types: Raises SchemaError if field types are None
Invalid field name types: Raises SchemaError if field names are not strings

Datetime Timezone Handling

Polars Datetime types can have timezone information (e.g., pl.Datetime(time_unit="ms", time_zone="UTC"))
When converting to PySpark TimestampType, timezone information is not preserved
TimestampNTZType (PySpark 3.4+) is converted to Polars Datetime without timezone information
The timezone metadata is lost in conversion, but the timestamp value is preserved

Advanced Examples

Nested Structures

import polars as pl
from charmander import to_pyspark_schema

# Define a nested Polars schema
polars_schema = {
    "user": pl.Struct([
        pl.Field("name", pl.String),
        pl.Field("address", pl.Struct([
            pl.Field("street", pl.String),
            pl.Field("city", pl.String),
            pl.Field("zip", pl.Int32),
        ])),
    ]),
}

pyspark_schema = to_pyspark_schema(polars_schema)

Arrays with Nested Types

import polars as pl
from charmander import to_pyspark_schema

# Nested arrays
polars_schema = {
    "matrix": pl.List(pl.List(pl.Float64)),
    "tags": pl.List(pl.String),
}

pyspark_schema = to_pyspark_schema(polars_schema)

Round-Trip Conversion

import polars as pl
from charmander import to_pyspark_schema, to_polars_schema

# Start with Polars schema (any format works)
original = {
    "name": pl.String,
    "age": pl.Int32,
    "scores": pl.List(pl.Float64),
}

# Convert to PySpark and back
pyspark = to_pyspark_schema(original)
converted_back = to_polars_schema(pyspark)  # Returns pl.Schema

# Verify types match (pl.Schema supports dict-like access)
assert converted_back["name"] == original["name"]
assert converted_back["age"] == original["age"]
assert isinstance(converted_back, pl.Schema)

Error Handling

Charmander provides clear error messages through custom exceptions. All exceptions inherit from ConversionError, so you can catch all conversion errors at once or handle them individually:

from charmander import ConversionError, UnsupportedTypeError, SchemaError

# Example 1: Handle specific error types
try:
    schema = to_pyspark_schema(invalid_schema)
except SchemaError as e:
    print(f"Invalid schema structure: {e}")
    # Handles: duplicate field names, empty field names, invalid field types, etc.
except UnsupportedTypeError as e:
    print(f"Unsupported type: {e}")
    # Handles: types that cannot be converted between Polars and PySpark
except ConversionError as e:
    print(f"General conversion error: {e}")
    # Catches all conversion-related errors (base class)

# Example 2: Catch all conversion errors
try:
    schema = to_pyspark_schema(invalid_schema)
except ConversionError as e:
    print(f"Conversion failed: {e}")
    # This will catch SchemaError, UnsupportedTypeError, and any future error types

# Example 3: Common error scenarios
try:
    # Invalid iterable format
    schema = to_pyspark_schema([("name", pl.String), "invalid"])
except SchemaError as e:
    print(f"Schema validation failed: {e}")
    # Output: "Invalid schema format: <class 'list'>. Expected iterable of (field_name, type) tuples. Item at index 1 is not a tuple: 'invalid'"

try:
    # Duplicate field names
    schema = to_pyspark_schema([("name", pl.String), ("name", pl.Int32)])
except SchemaError as e:
    print(f"Duplicate field: {e}")
    # Output: "Invalid schema format: <class 'list'>. Duplicate field name found: 'name'"

try:
    # Unsupported type
    schema = to_pyspark_schema({"field": some_unsupported_type})
except UnsupportedTypeError as e:
    print(f"Unsupported type: {e}")
    # Output includes list of supported types

API Reference

`to_pyspark_schema(polars_schema)`

Convert a Polars schema to a PySpark StructType.

Parameters:

polars_schema: Polars schema in any supported format:
- pl.Schema object
- dict[str, pl.DataType]: Dictionary mapping field names to types
- Iterable[tuple[str, pl.DataType]]: Iterable of (field_name, type) tuples (e.g., list or tuple of tuples)

Returns:

pyspark.sql.types.StructType: PySpark schema

Raises:

SchemaError: If the schema structure is invalid
UnsupportedTypeError: If a type cannot be converted

Example:

import polars as pl
from charmander import to_pyspark_schema

# All three formats work:
schema1 = {"name": pl.String, "age": pl.Int32}
schema2 = pl.Schema({"name": pl.String, "age": pl.Int32})
schema3 = [("name", pl.String), ("age", pl.Int32)]

pyspark_schema = to_pyspark_schema(schema1)  # or schema2, or schema3

`to_polars_schema(pyspark_schema)`

Convert a PySpark StructType to a Polars schema.

Parameters:

pyspark_schema (pyspark.sql.types.StructType): PySpark schema

Returns:

pl.Schema: Polars Schema object mapping field names to Polars types

Raises:

SchemaError: If the schema structure is invalid
UnsupportedTypeError: If a type cannot be converted

Example:

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
from charmander import to_polars_schema

pyspark_schema = StructType([
    StructField("name", StringType()),
    StructField("age", IntegerType())
])

polars_schema = to_polars_schema(pyspark_schema)
# Returns pl.Schema object - use directly with Polars DataFrames
df = pl.DataFrame({}, schema=polars_schema)

Development

Running Tests

pip install -e ".[dev]"
pytest

Project Structure

charmander/
├── charmander/
│   ├── __init__.py          # Public API
│   ├── converters.py         # Core conversion functions
│   ├── type_mappings.py      # Type mapping dictionaries
│   └── errors.py             # Custom exceptions
├── tests/
│   ├── test_converters.py    # Conversion tests
│   └── test_type_mappings.py # Type mapping tests
└── pyproject.toml            # Package configuration

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Inspiration

This project is inspired by poldantic, which provides similar functionality for converting between Pydantic models and Polars schemas.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Dec 8, 2025

0.1.0

Dec 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charmander-0.2.0.tar.gz (19.5 kB view details)

Uploaded Dec 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

charmander-0.2.0-py3-none-any.whl (12.0 kB view details)

Uploaded Dec 8, 2025 Python 3

File details

Details for the file charmander-0.2.0.tar.gz.

File metadata

Download URL: charmander-0.2.0.tar.gz
Upload date: Dec 8, 2025
Size: 19.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for charmander-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`426b5dd3431dd9b01b11fd5df9dc6bbe9f5ed3a7a7194d06113db6345f37d4dd`
MD5	`be5a7eeb6578c3e3b2f263be09d4f429`
BLAKE2b-256	`72465b37beadeb5aa083724e8c4d71c6217356bd06d3b023e2e122c4cccb1de5`

See more details on using hashes here.

File details

Details for the file charmander-0.2.0-py3-none-any.whl.

File metadata

Download URL: charmander-0.2.0-py3-none-any.whl
Upload date: Dec 8, 2025
Size: 12.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for charmander-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6600c2166f7d400e7a06cddff927c9a58f73ca1e7a28293dac1f3fef77752918`
MD5	`2054294424c352dcc8c19d78892be044`
BLAKE2b-256	`16900b88cfe12d7ec91e0c9d4d3f52bc56beed47e0f192dda5a67e25733e729f`

See more details on using hashes here.

charmander 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Charmander

Installation

Requirements

Quick Start

Converting Polars Schema to PySpark

Converting PySpark Schema to Polars

Features

Supported Types

Primitive Types

Complex Types

Limitations

Type Conversions with Information Loss

Nullability

Input Validation

Datetime Timezone Handling

Advanced Examples

Nested Structures

Arrays with Nested Types

Round-Trip Conversion

Error Handling

API Reference

to_pyspark_schema(polars_schema)

to_polars_schema(pyspark_schema)

Development

Running Tests

Project Structure

License

Contributing

Inspiration

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`to_pyspark_schema(polars_schema)`

`to_polars_schema(pyspark_schema)`