Generate Polars DataFrames using polyfactory for testing and development
Project description
Polypolars
Generate type-safe Polars DataFrames effortlessly using polyfactory
Inspired by polyspark, polypolars lets you create realistic test DataFrames from your Python data models—with automatic schema inference for Polars.
Docs: See the docs/ folder and run mkdocs serve for the full API reference and examples.
Example
from dataclasses import dataclass
from polypolars import polars_factory
@polars_factory
@dataclass
class User:
id: int
name: str
email: str
# Generate 1000 rows instantly:
df = User.build_dataframe(size=1000)
print(df.head())
Example output (data varies per run):
shape: (5, 3)
┌──────┬──────────────────────┬──────────────────────┐
│ id ┆ name ┆ email │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞══════╪══════════════════════╪══════════════════════╡
│ 3167 ┆ QmYHeLMDMxWChjihAFxU ┆ vHGMKHjXsMBlxLuhqpUE │
│ 1028 ┆ hvLXPtlqURtwzqeyJruo ┆ ePDAdtelIEiRfEuAgoPz │
│ 9048 ┆ NhnyGGQsTjxPEndxaOCt ┆ znmByWtpwofUGKolkJrs │
│ 971 ┆ ZlkxcjcVAZfLUkCwHRFG ┆ PTtzmMHcvLQPcOrAgFpl │
│ 3813 ┆ tIqqrgyYjULzdyRKkMKK ┆ tMAFeQewaQFtRGEvOdqW │
└──────┴──────────────────────┴──────────────────────┘
Contents
- Why Polypolars? · Installation · Quick Start · Schema inference · Type mapping · CLI · I/O and testing
Why Polypolars?
- Factory pattern: Leverage polyfactory for data generation
- Type-safe schema: Python types become Polars dtypes automatically
- Nullable handling:
Optional[T]and defaults are reflected in the schema - Complex types: Nested structs, lists, and dicts (as list-of-structs)
- Multiple models: Dataclasses, Pydantic, and TypedDict
Installation
pip install polypolars
For development:
pip install "polypolars[dev]"
Quick Start
Decorator (recommended)
from dataclasses import dataclass
from typing import Optional
from polypolars import polars_factory
@polars_factory
@dataclass
class Product:
product_id: int
name: str
price: float
description: Optional[str] = None
in_stock: bool = True
# Build Polars DataFrame
df = Product.build_dataframe(size=100)
print(df.head())
# Or get dicts
dicts = Product.build_dicts(size=50)
Example output (first 5 rows; data varies per run):
shape: (5, 5)
┌────────────┬──────────────────────┬──────────────┬──────────────────────┬──────────┐
│ product_id ┆ name ┆ price ┆ description ┆ in_stock │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ f64 ┆ str ┆ bool │
╞════════════╪══════════════════════╪══════════════╪══════════════════════╪══════════╡
│ 5582 ┆ hKJsoOOXlwgLIiiWOCJP ┆ 2.2760e8 ┆ rTUACBLlGBlHXIjzVvPt ┆ false │
│ 7099 ┆ ZgUiDVJirxAYRrWIPnpS ┆ 274887.17671 ┆ bHGMXNFRLSDifpywMZrY ┆ true │
│ 5372 ┆ MTtVHJkqneaCkoyZNgio ┆ 1.5195e7 ┆ HsAmRwgaphvQxOCJwjSr ┆ false │
│ 8650 ┆ fTBYFPiWMFCKauieEXlu ┆ -7.8765e8 ┆ UAnyfVhTUmvcjtzbCufq ┆ true │
│ 1023 ┆ MCtTOwvJTjfbpPELcFKm ┆ -97.933431 ┆ PMEHaEOGaoJiDaomXdVX ┆ false │
└────────────┴──────────────────────┴──────────────┴──────────────────────┴──────────┘
Classic factory class
from polypolars import PolarsFactory
class ProductFactory(PolarsFactory[Product]):
__model__ = Product
df = ProductFactory.build_dataframe(size=100)
Convenience function
from polypolars import build_polars_dataframe
df = build_polars_dataframe(Product, size=100)
Schema inference
Schema is inferred from your type hints, so all-null columns still get the correct type:
@polars_factory
@dataclass
class User:
id: int
email: Optional[str] # nullable string in Polars
df = User.build_dataframe(size=100) # schema: id Int64, email String
From dicts
dicts = Product.build_dicts(size=1000)
# Convert to DataFrame when needed:
df = Product.create_dataframe_from_dicts(dicts)
Pydantic
from pydantic import BaseModel, Field
from polypolars import polars_factory
@polars_factory
class User(BaseModel):
id: int = Field(gt=0)
username: str = Field(min_length=3, max_length=20)
email: str
is_active: bool = True
df = User.build_dataframe(size=500)
Type mapping
| Python | Polars |
|---|---|
str |
String |
int |
Int64 |
float |
Float64 |
bool |
Boolean |
datetime |
Datetime |
date |
Date |
List[T] |
List(T) |
Dict[K,V] |
List(Struct(key, value)) |
Optional[T] |
T (nullable) |
Tuple[T, ...] |
List(T) |
Tuple[T, T, ...] (fixed) |
Array(T, n) |
| Dataclass / Pydantic | Struct(...) |
Use schema_overrides (e.g. {"col": pl.Categorical}) to override inferred types.
LazyFrame and chunked building
# LazyFrame
lf = Product.build_lazy_dataframe(size=10_000)
# Chunked building for very large size (lower memory)
df = Product.build_dataframe(size=1_000_000, chunk_size=10_000)
CLI
# Export schema
polypolars schema export myapp.models:User --output schema.txt
# Validate a file against a model
polypolars schema validate myapp.models:User data.parquet
# Generate sample data
polypolars generate myapp.models:User --size 1000 --output users.parquet --format parquet
I/O and testing
from polypolars import (
save_as_parquet,
load_parquet,
load_and_validate,
infer_schema,
assert_dataframe_equal,
assert_schema_equal,
)
df = User.build_dataframe(size=1000)
save_as_parquet(df, "users.parquet")
# Load and validate
schema = infer_schema(User)
df2 = load_and_validate("users.parquet", expected_schema=schema)
assert_dataframe_equal(df, df2, check_order=False)
License
MIT
Related
- polyspark – inspiration for this library
- polyfactory – factory library for mock data
- Polars – fast DataFrame library
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polypolars-0.1.0.tar.gz.
File metadata
- Download URL: polypolars-0.1.0.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b25386392d8ce12c3704caf8e6fc95a0cb4c33e3440073c640964db0970d6890
|
|
| MD5 |
93aba7ae607b08a5bf7159ac01cfcce9
|
|
| BLAKE2b-256 |
0cdb7711205f6d4675b6a22090b4221068bb7df281e68f5149866eff36c99c93
|
File details
Details for the file polypolars-0.1.0-py3-none-any.whl.
File metadata
- Download URL: polypolars-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcaa738e9d113d3c957c6b77701d2eb169551960257491fb0d1251ac486f07da
|
|
| MD5 |
ffb52b59d11433bdfa7313152719dda8
|
|
| BLAKE2b-256 |
2799b2f14b2807344c54ec11dad47a45016a37a190d83652b7a5d6948e806447
|