Convert between Pandas and Polars schemas with ease
Project description
Cubchoo
Convert Universal Between Common Handling of Object Organization
๐ป A lightweight Python library for seamless schema conversion between Pandas and Polars
Cubchoo provides simple, bidirectional conversion functions to transform schemas between Pandas and Polars, supporting all data types including nested structures, arrays, and categorical data. Perfect for projects that need to work with both libraries or migrate between them.
Table of Contents
- Features
- Installation
- Quick Start
- Supported Type Mappings
- Advanced Examples
- Error Handling
- Limitations and Notes
- API Reference
- Development
- Contributing
- License
Features
โจ Key Features:
- ๐ Bidirectional Conversion: Convert schemas in both directions (Pandas โ Polars)
- ๐ฆ Multiple Input Formats: Support for dictionaries, lists, DataFrames, Series, and Schema objects
- ๐ฏ Comprehensive Type Support: All basic types (integers, floats, strings, booleans, datetimes, etc.)
- โ Nullable Types: Full support for nullable integer and float types (Int64, Float64, etc.)
- ๐ท๏ธ Categorical Data: Support for categorical data types
- ๐ก๏ธ Error Handling: Clear error messages with custom exceptions
- โ๏ธ Input Validation: Validates schemas before conversion
- ๐ Zero Dependencies: Only requires pandas and polars (no additional dependencies)
- ๐ชถ PyArrow Support: Optional PyArrow type support (install with
pip install cubchoo[pyarrow])
Installation
From PyPI
pip install cubchoo
With PyArrow Support (Optional)
PyArrow support is available as an optional dependency:
pip install cubchoo[pyarrow]
From Source
git clone https://github.com/eddiethedean/cubchoo.git
cd cubchoo
pip install -e .
Development Installation
pip install -e ".[dev]"
All Dependencies
pip install cubchoo[all]
Requirements
- Python: >= 3.8
- pandas: >= 1.3.0
- polars: >= 0.19.0
- pyarrow: >= 10.0.0 (optional, for PyArrow type support)
Quick Start
Get started with Cubchoo in just a few lines of code!
Converting Pandas Schema to Polars
Cubchoo supports multiple Pandas schema formats - use whichever is most convenient:
import pandas as pd
from cubchoo import to_polars_schema
# Format 1: Dictionary with string dtype names
pandas_schema_dict = {
"name": "string",
"age": "Int64",
"score": "Float64",
"tags": "object", # Lists/arrays in pandas are typically object type
}
# Format 2: Dictionary with pandas dtype objects
pandas_schema_dtypes = {
"name": pd.StringDtype(),
"age": pd.Int64Dtype(),
"score": pd.Float64Dtype(),
}
# Format 3: pandas DataFrame (schema inferred from dtypes)
df = pd.DataFrame({
"name": ["Alice", "Bob"],
"age": [30, 25],
"score": [95.5, 87.0],
})
# Format 4: List of tuples
pandas_schema_list = [
("name", "string"),
("age", "Int64"),
("score", "Float64"),
]
# All formats work identically!
polars_schema = to_polars_schema(pandas_schema_dict)
# or: to_polars_schema(pandas_schema_dtypes)
# or: to_polars_schema(df)
# or: to_polars_schema(pandas_schema_list)
print(polars_schema)
# Schema({'name': <class 'polars.datatypes.String'>,
# 'age': <class 'polars.datatypes.Int64'>,
# 'score': <class 'polars.datatypes.Float64'>, ...})
# Use directly with Polars DataFrame
import polars as pl
df_polars = pl.DataFrame({}, schema=polars_schema)
Converting Polars Schema to Pandas
import polars as pl
from cubchoo import to_pandas_schema
# Define a Polars schema
polars_schema = pl.Schema({
"name": pl.String,
"age": pl.Int32,
"score": pl.Float64,
"tags": pl.List(pl.String),
})
# Convert to Pandas schema (returns dict of dtype strings)
pandas_schema = to_pandas_schema(polars_schema)
print(pandas_schema)
# {'name': 'string', 'age': 'Int32', 'score': 'Float64', 'tags': 'object'}
# Use with pandas DataFrame
df = pd.DataFrame({}, dtype=pandas_schema)
Using PyArrow Types (Optional)
With PyArrow support installed, you can use PyArrow types (pa.DataType) directly in your schemas:
import pyarrow as pa
from cubchoo import to_polars_schema, to_pandas_schema
# Use PyArrow types in a schema dictionary
schema_with_pa = {
"name": pa.string(),
"age": pa.int32(),
"score": pa.float64(),
}
# Convert PyArrow types to Polars
polars_schema = to_polars_schema(schema_with_pa)
# Convert PyArrow types to Pandas
pandas_schema = to_pandas_schema(schema_with_pa)
# You can also mix Pandas and PyArrow types
mixed_schema = {
"name": "string", # Pandas dtype string
"age": pa.int32(), # PyArrow type
"score": "Float64", # Pandas dtype string
}
polars_schema = to_polars_schema(mixed_schema)
Note: PyArrow support requires installing with pip install cubchoo[pyarrow].
Supported Type Mappings
Cubchoo supports comprehensive type mappings between Pandas and Polars. Below are the supported conversions:
Pandas โ Polars
| Pandas Type | Polars Type |
|---|---|
int8, Int8 |
pl.Int8 |
int16, Int16 |
pl.Int16 |
int32, Int32 |
pl.Int32 |
int64, Int64 |
pl.Int64 |
uint8, UInt8 |
pl.UInt8 |
uint16, UInt16 |
pl.UInt16 |
uint32, UInt32 |
pl.UInt32 |
uint64, UInt64 |
pl.UInt64 |
float32, Float32 |
pl.Float32 |
float64, Float64 |
pl.Float64 |
object, string, str |
pl.String |
bool, boolean |
pl.Boolean |
datetime64[ns] |
pl.Datetime |
datetime64[us] |
pl.Datetime(time_unit="us") |
datetime64[ms] |
pl.Datetime(time_unit="ms") |
datetime64[s] |
pl.Datetime(time_unit="ms") * |
date |
pl.Date |
timedelta64[ns] |
pl.Duration |
timedelta64[us] |
pl.Duration(time_unit="us") |
timedelta64[ms] |
pl.Duration(time_unit="ms") |
timedelta64[s] |
pl.Duration(time_unit="ms") * |
category |
pl.Categorical |
* Note: Polars doesn't support seconds (s) time unit, so datetime64[s] and timedelta64[s] are converted to milliseconds (ms) as the closest equivalent.
Polars โ Pandas
| Polars Type | Pandas Type |
|---|---|
pl.Int8 |
Int8 |
pl.Int16 |
Int16 |
pl.Int32 |
Int32 |
pl.Int64 |
Int64 |
pl.UInt8 |
UInt8 |
pl.UInt16 |
UInt16 |
pl.UInt32 |
UInt32 |
pl.UInt64 |
UInt64 |
pl.Float32 |
Float32 |
pl.Float64 |
Float64 |
pl.String, pl.Utf8 |
string |
pl.Boolean |
boolean |
pl.Datetime |
datetime64[ns] |
pl.Date |
date |
pl.Duration |
timedelta64[ns] |
pl.Categorical |
category |
pl.List(...) |
object |
pl.Struct(...) |
object |
pl.Map(...) |
object |
Limitations and Notes
Complex Types
- List/Array Types: Polars
Listtypes are converted to pandasobjectdtype, as pandas doesn't have native array/list dtypes. The semantic meaning is preserved but the structure information is lost. - Struct Types: Polars
Structtypes are converted to pandasobjectdtype, as pandas doesn't have native struct types. - Map Types: Polars
Maptypes are converted to pandasobjectdtype, as pandas doesn't have native map types.
Datetime Handling
- Pandas
datetime64types with different time units are converted to PolarsDatetimewith corresponding time units. - Timezone information in Polars
Datetimetypes is not preserved when converting to pandas (pandas uses timezone-naive by default). - Polars
Timetypes are converted to pandasobjectdtype, as pandas doesn't have a native time type.
Nullability
- Pandas โ Polars: Nullable types (Int64, Float64, etc.) are properly converted to their Polars equivalents.
- Polars โ Pandas: All Polars types can contain nulls, so nullable pandas dtypes (Int64, Float64, etc.) are used when appropriate.
Input Validation
Cubchoo validates schemas before conversion:
- Duplicate field names: Raises
SchemaErrorif duplicate field names are detected - Empty field names: Raises
SchemaErrorif any field name is an empty string - Invalid field types: Raises
SchemaErrorif field types areNone - Invalid field name types: Raises
SchemaErrorif field names are not strings
Advanced Examples
Working with DataFrames
Easily convert schemas between DataFrames:
import pandas as pd
import polars as pl
from cubchoo import to_polars_schema, to_pandas_schema
# Start with a pandas DataFrame
df_pandas = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [30, 25, 35],
"score": [95.5, 87.0, 92.3],
})
# Convert schema to Polars
polars_schema = to_polars_schema(df_pandas)
# Create empty Polars DataFrame with same schema
df_polars = pl.DataFrame({}, schema=polars_schema)
# Convert back to pandas schema
pandas_schema = to_pandas_schema(polars_schema)
print(pandas_schema)
# {'name': 'string', 'age': 'Int64', 'score': 'Float64'}
Categorical Data
Handle categorical data seamlessly:
import pandas as pd
import polars as pl
from cubchoo import to_polars_schema, to_pandas_schema
# Pandas categorical
pandas_schema = {"category": "category"}
polars_schema = to_polars_schema(pandas_schema)
assert polars_schema["category"] == pl.Categorical()
# Polars categorical
polars_schema = pl.Schema({"category": pl.Categorical()})
pandas_schema = to_pandas_schema(polars_schema)
assert pandas_schema["category"] == "category"
Series Conversion
Convert single-column schemas from pandas Series:
import pandas as pd
from cubchoo import to_polars_schema
# Named Series
series = pd.Series([1, 2, 3], name="numbers", dtype="Int64")
polars_schema = to_polars_schema(series)
# Returns: Schema({'numbers': <class 'polars.datatypes.Int64'>})
Round-Trip Conversion
Verify schema integrity with round-trip conversions:
import pandas as pd
import polars as pl
from cubchoo import to_polars_schema, to_pandas_schema
# Start with Pandas schema
original = {
"name": "string",
"age": "Int64",
"score": "Float64",
}
# Convert to Polars and back
polars_schema = to_polars_schema(original)
converted_back = to_pandas_schema(polars_schema)
# Verify types match
assert converted_back["name"] == original["name"]
assert converted_back["age"] == original["age"]
assert converted_back["score"] == original["score"]
Datetime with Time Units
Handle different datetime precision:
import pandas as pd
import polars as pl
from cubchoo import to_polars_schema, to_pandas_schema
# Pandas with different time units
pandas_schema = {
"ns_timestamp": "datetime64[ns]",
"us_timestamp": "datetime64[us]",
"ms_timestamp": "datetime64[ms]",
}
polars_schema = to_polars_schema(pandas_schema)
# Polars with time units
polars_schema = {
"ns": pl.Datetime(time_unit="ns"),
"us": pl.Datetime(time_unit="us"),
"ms": pl.Datetime(time_unit="ms"),
}
pandas_schema = to_pandas_schema(polars_schema)
Error Handling
Cubchoo provides clear error messages through custom exceptions. All exceptions inherit from ConversionError, so you can catch all conversion errors at once or handle them individually:
from cubchoo import ConversionError, UnsupportedTypeError, SchemaError
# Example 1: Handle specific error types
try:
schema = to_polars_schema(invalid_schema)
except SchemaError as e:
print(f"Invalid schema structure: {e}")
# Handles: duplicate field names, empty field names, invalid field types, etc.
except UnsupportedTypeError as e:
print(f"Unsupported type: {e}")
# Handles: types that cannot be converted between Pandas and Polars
except ConversionError as e:
print(f"General conversion error: {e}")
# Catches all conversion-related errors (base class)
# Example 2: Catch all conversion errors
try:
schema = to_polars_schema(invalid_schema)
except ConversionError as e:
print(f"Conversion failed: {e}")
# This will catch SchemaError, UnsupportedTypeError, and any future error types
API Reference
to_polars_schema(pandas_schema)
Convert a Pandas schema to a Polars Schema.
Parameters:
pandas_schema: Pandas schema in any supported format:dict[str, dtype]: Dictionary mapping field names to dtypespandas.DataFrame: DataFrame (schema inferred from dtypes)pandas.Series: Series (for single column)list[tuple[str, dtype]]: List of (field_name, dtype) tuples
Returns:
polars.Schema: Polars Schema object mapping field names to Polars types
Raises:
SchemaError: If the schema structure is invalidUnsupportedTypeError: If a type cannot be converted
Example:
import pandas as pd
from cubchoo import to_polars_schema
# All formats work:
schema1 = {"name": "string", "age": "Int64"}
schema2 = pd.DataFrame({"name": ["Alice"], "age": [30]})
schema3 = [("name", "string"), ("age", "Int64")]
polars_schema = to_polars_schema(schema1) # or schema2, or schema3
to_pandas_schema(polars_schema)
Convert a Polars Schema to a Pandas schema.
Parameters:
polars_schema: Polars schema in any supported format:pl.Schemaobjectdict[str, pl.DataType]: Dictionary mapping field names to typeslist[tuple[str, pl.DataType]]: List of (field_name, type) tuples
Returns:
dict[str, str]: Dictionary mapping column names to pandas dtype strings
Raises:
SchemaError: If the schema structure is invalidUnsupportedTypeError: If a type cannot be converted
Example:
import polars as pl
from cubchoo import to_pandas_schema
polars_schema = pl.Schema({
"name": pl.String,
"age": pl.Int32
})
pandas_schema = to_pandas_schema(polars_schema)
# Returns: {'name': 'string', 'age': 'Int32'}
PyArrow Type Support
Both to_polars_schema() and to_pandas_schema() support PyArrow types (pa.DataType) as input. When PyArrow types are detected in a schema, they are automatically converted to the target format.
Example:
import pyarrow as pa
from cubchoo import to_polars_schema, to_pandas_schema
# Schema with PyArrow types
schema = {
"name": pa.string(),
"age": pa.int32(),
"score": pa.float64(),
}
# Convert to Polars
polars_schema = to_polars_schema(schema)
# Convert to Pandas
pandas_schema = to_pandas_schema(schema)
Note: PyArrow support requires installing with pip install cubchoo[pyarrow].
Development
Setup
- Clone the repository:
git clone https://github.com/eddiethedean/cubchoo.git
cd cubchoo
- Install in development mode:
pip install -e ".[dev]"
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=cubchoo --cov-report=html
# Run specific test file
pytest tests/test_converters.py
Code Quality
# Format code
ruff format .
# Lint code
ruff check .
# Type checking
mypy cubchoo
Project Structure
cubchoo/
โโโ cubchoo/
โ โโโ __init__.py # Public API
โ โโโ converters.py # Core conversion functions
โ โโโ type_mappings.py # Type mapping dictionaries
โ โโโ errors.py # Custom exceptions
โโโ tests/
โ โโโ __init__.py
โ โโโ test_converters.py # Conversion tests
โ โโโ test_type_mappings.py # Type mapping tests
โโโ .gitignore
โโโ LICENSE
โโโ README.md
โโโ CHANGELOG.md
โโโ pyproject.toml # Package configuration
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Odos Matthews
- Email: odosmatthews@gmail.com
- GitHub: @eddiethedean
Made with โค๏ธ for the Python data science community
Contributing
Contributions are welcome and greatly appreciated! ๐
How to Contribute
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Ensure all tests pass (
pytest) - Ensure code quality (
ruff format . && ruff check . && mypy cubchoo) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Reporting Issues
If you find a bug or have a feature request, please open an issue on GitHub.
Related Projects
- charmander - Convert between Polars and PySpark schemas
- poldantic - Convert between Pydantic models and Polars schemas
Inspiration
This project is inspired by charmander, which provides similar functionality for converting between Polars and PySpark schemas.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cubchoo-0.1.0.tar.gz.
File metadata
- Download URL: cubchoo-0.1.0.tar.gz
- Upload date:
- Size: 31.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8abe447d9c9025921851f414c6b641673e17c4470cba6336502738aaad287c2d
|
|
| MD5 |
b6a379548cc7c4d28981273a0c09e7a2
|
|
| BLAKE2b-256 |
56fffb73373c9b7d05607823e4b6bb0c8723e968c52b12cddb75745a63d1c804
|
File details
Details for the file cubchoo-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cubchoo-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fa05f93e54ab216991e5473d508650449f3c00f50cb61f8aaa9d25bce513330
|
|
| MD5 |
f5d67c1e214393b60e8a3e5340c4825b
|
|
| BLAKE2b-256 |
c5915ea6b84f5a7c4c0042f9ba766dbb6ea271e07f6e7ee8faa147da0b3d970c
|