Skip to main content

Generate Polars DataFrames using polyfactory for testing and development

Project description

Polypolars

CI PyPI Python 3.8+ License: MIT Code style: ruff

Generate type-safe Polars DataFrames effortlessly using polyfactory

Inspired by polyspark, polypolars lets you create realistic test DataFrames from your Python data models—with automatic schema inference for Polars.

Docs: See the docs/ folder and run mkdocs serve for the full API reference and examples.

Example

from dataclasses import dataclass
from polypolars import polars_factory

@polars_factory
@dataclass
class User:
    id: int
    name: str
    email: str

# Generate 1000 rows instantly:
df = User.build_dataframe(size=1000)
print(df.head())

Example output (data varies per run):

shape: (5, 3)
┌──────┬──────────────────────┬──────────────────────┐
│ id   ┆ name                 ┆ email                │
│ ---  ┆ ---                  ┆ ---                  │
│ i64  ┆ str                  ┆ str                  │
╞══════╪══════════════════════╪══════════════════════╡
│ 3167 ┆ QmYHeLMDMxWChjihAFxU ┆ vHGMKHjXsMBlxLuhqpUE │
│ 1028 ┆ hvLXPtlqURtwzqeyJruo ┆ ePDAdtelIEiRfEuAgoPz │
│ 9048 ┆ NhnyGGQsTjxPEndxaOCt ┆ znmByWtpwofUGKolkJrs │
│  971 ┆ ZlkxcjcVAZfLUkCwHRFG ┆ PTtzmMHcvLQPcOrAgFpl │
│ 3813 ┆ tIqqrgyYjULzdyRKkMKK ┆ tMAFeQewaQFtRGEvOdqW │
└──────┴──────────────────────┴──────────────────────┘

Contents

Why Polypolars?

  • Factory pattern: Leverage polyfactory for data generation
  • Type-safe schema: Python types become Polars dtypes automatically
  • Nullable handling: Optional[T] and defaults are reflected in the schema
  • Complex types: Nested structs, lists, and dicts (as list-of-structs)
  • Multiple models: Dataclasses, Pydantic, and TypedDict

Installation

pip install polypolars

For development:

pip install "polypolars[dev]"

Quick Start

Decorator (recommended)

from dataclasses import dataclass
from typing import Optional
from polypolars import polars_factory

@polars_factory
@dataclass
class Product:
    product_id: int
    name: str
    price: float
    description: Optional[str] = None
    in_stock: bool = True

# Build Polars DataFrame
df = Product.build_dataframe(size=100)
print(df.head())

# Or get dicts
dicts = Product.build_dicts(size=50)

Example output (first 5 rows; data varies per run):

shape: (5, 5)
┌────────────┬──────────────────────┬──────────────┬──────────────────────┬──────────┐
│ product_id ┆ name                 ┆ price        ┆ description          ┆ in_stock │
│ ---        ┆ ---                  ┆ ---          ┆ ---                  ┆ ---      │
│ i64        ┆ str                  ┆ f64          ┆ str                  ┆ bool     │
╞════════════╪══════════════════════╪══════════════╪══════════════════════╪══════════╡
│ 5582       ┆ hKJsoOOXlwgLIiiWOCJP ┆ 2.2760e8     ┆ rTUACBLlGBlHXIjzVvPt ┆ false    │
│ 7099       ┆ ZgUiDVJirxAYRrWIPnpS ┆ 274887.17671 ┆ bHGMXNFRLSDifpywMZrY ┆ true     │
│ 5372       ┆ MTtVHJkqneaCkoyZNgio ┆ 1.5195e7     ┆ HsAmRwgaphvQxOCJwjSr ┆ false    │
│ 8650       ┆ fTBYFPiWMFCKauieEXlu ┆ -7.8765e8    ┆ UAnyfVhTUmvcjtzbCufq ┆ true     │
│ 1023       ┆ MCtTOwvJTjfbpPELcFKm ┆ -97.933431   ┆ PMEHaEOGaoJiDaomXdVX ┆ false    │
└────────────┴──────────────────────┴──────────────┴──────────────────────┴──────────┘

Classic factory class

from polypolars import PolarsFactory

class ProductFactory(PolarsFactory[Product]):
    __model__ = Product

df = ProductFactory.build_dataframe(size=100)

Convenience function

from polypolars import build_polars_dataframe

df = build_polars_dataframe(Product, size=100)

Schema inference

Schema is inferred from your type hints, so all-null columns still get the correct type:

@polars_factory
@dataclass
class User:
    id: int
    email: Optional[str]  # nullable string in Polars

df = User.build_dataframe(size=100)  # schema: id Int64, email String

From dicts

dicts = Product.build_dicts(size=1000)
# Convert to DataFrame when needed:
df = Product.create_dataframe_from_dicts(dicts)

Pydantic

from pydantic import BaseModel, Field
from polypolars import polars_factory

@polars_factory
class User(BaseModel):
    id: int = Field(gt=0)
    username: str = Field(min_length=3, max_length=20)
    email: str
    is_active: bool = True

df = User.build_dataframe(size=500)

Type mapping

Python Polars
str String
int Int64
float Float64
bool Boolean
datetime Datetime
date Date
List[T] List(T)
Dict[K,V] List(Struct(key, value))
Optional[T] T (nullable)
Tuple[T, ...] List(T)
Tuple[T, T, ...] (fixed) Array(T, n)
Dataclass / Pydantic Struct(...)

Use schema_overrides (e.g. {"col": pl.Categorical}) to override inferred types.

LazyFrame and chunked building

# LazyFrame
lf = Product.build_lazy_dataframe(size=10_000)

# Chunked building for very large size (lower memory)
df = Product.build_dataframe(size=1_000_000, chunk_size=10_000)

CLI

# Export schema
polypolars schema export myapp.models:User --output schema.txt

# Validate a file against a model
polypolars schema validate myapp.models:User data.parquet

# Generate sample data
polypolars generate myapp.models:User --size 1000 --output users.parquet --format parquet

I/O and testing

from polypolars import (
    save_as_parquet,
    load_parquet,
    load_and_validate,
    infer_schema,
    assert_dataframe_equal,
    assert_schema_equal,
)

df = User.build_dataframe(size=1000)
save_as_parquet(df, "users.parquet")

# Load and validate
schema = infer_schema(User)
df2 = load_and_validate("users.parquet", expected_schema=schema)

assert_dataframe_equal(df, df2, check_order=False)

License

MIT

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polypolars-0.1.1.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polypolars-0.1.1-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file polypolars-0.1.1.tar.gz.

File metadata

  • Download URL: polypolars-0.1.1.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for polypolars-0.1.1.tar.gz
Algorithm Hash digest
SHA256 aefeacd71f503f426980379dac4e90153f6091dc37c1cf8c94cb63cc1da6f0a2
MD5 603c76952178b90bc832692ef0ba06e5
BLAKE2b-256 f19ed0d9432d0c700b80eca0dc1283239496ecf614ec70826b2cf4f520f5d66d

See more details on using hashes here.

File details

Details for the file polypolars-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: polypolars-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for polypolars-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cd1ee593505dca1b672c8cbdbb5673b2f6e36d3dbe165e4b121ece661804a8de
MD5 79fb8d655ba4c259c5990ca15c234d08
BLAKE2b-256 a785aa8a8d10e244b96f8fb3665afea450be1507a7cc938ad8da37ea28ef43d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page