Skip to main content

Generate Polars DataFrames using polyfactory for testing and development

Project description

Polypolars

CI PyPI Python 3.8+ License: MIT Code style: ruff

Generate type-safe Polars DataFrames effortlessly using polyfactory

Inspired by polyspark, polypolars lets you create realistic test DataFrames from your Python data models—with automatic schema inference for Polars.

Docs: See the docs/ folder and run mkdocs serve for the full API reference and examples.

Example

from dataclasses import dataclass
from polypolars import polars_factory

@polars_factory
@dataclass
class User:
    id: int
    name: str
    email: str

# Generate 1000 rows instantly:
df = User.build_dataframe(size=1000)
print(df.head())

Example output (data varies per run):

shape: (5, 3)
┌──────┬──────────────────────┬──────────────────────┐
│ id   ┆ name                 ┆ email                │
│ ---  ┆ ---                  ┆ ---                  │
│ i64  ┆ str                  ┆ str                  │
╞══════╪══════════════════════╪══════════════════════╡
│ 3167 ┆ QmYHeLMDMxWChjihAFxU ┆ vHGMKHjXsMBlxLuhqpUE │
│ 1028 ┆ hvLXPtlqURtwzqeyJruo ┆ ePDAdtelIEiRfEuAgoPz │
│ 9048 ┆ NhnyGGQsTjxPEndxaOCt ┆ znmByWtpwofUGKolkJrs │
│  971 ┆ ZlkxcjcVAZfLUkCwHRFG ┆ PTtzmMHcvLQPcOrAgFpl │
│ 3813 ┆ tIqqrgyYjULzdyRKkMKK ┆ tMAFeQewaQFtRGEvOdqW │
└──────┴──────────────────────┴──────────────────────┘

Contents

Why Polypolars?

  • Factory pattern: Leverage polyfactory for data generation
  • Type-safe schema: Python types become Polars dtypes automatically
  • Nullable handling: Optional[T] and defaults are reflected in the schema
  • Complex types: Nested structs, lists, and dicts (as list-of-structs)
  • Multiple models: Dataclasses, Pydantic, and TypedDict

Installation

pip install polypolars

For development:

pip install "polypolars[dev]"

Quick Start

Decorator (recommended)

from dataclasses import dataclass
from typing import Optional
from polypolars import polars_factory

@polars_factory
@dataclass
class Product:
    product_id: int
    name: str
    price: float
    description: Optional[str] = None
    in_stock: bool = True

# Build Polars DataFrame
df = Product.build_dataframe(size=100)
print(df.head())

# Or get dicts
dicts = Product.build_dicts(size=50)

Example output (first 5 rows; data varies per run):

shape: (5, 5)
┌────────────┬──────────────────────┬──────────────┬──────────────────────┬──────────┐
│ product_id ┆ name                 ┆ price        ┆ description          ┆ in_stock │
│ ---        ┆ ---                  ┆ ---          ┆ ---                  ┆ ---      │
│ i64        ┆ str                  ┆ f64          ┆ str                  ┆ bool     │
╞════════════╪══════════════════════╪══════════════╪══════════════════════╪══════════╡
│ 5582       ┆ hKJsoOOXlwgLIiiWOCJP ┆ 2.2760e8     ┆ rTUACBLlGBlHXIjzVvPt ┆ false    │
│ 7099       ┆ ZgUiDVJirxAYRrWIPnpS ┆ 274887.17671 ┆ bHGMXNFRLSDifpywMZrY ┆ true     │
│ 5372       ┆ MTtVHJkqneaCkoyZNgio ┆ 1.5195e7     ┆ HsAmRwgaphvQxOCJwjSr ┆ false    │
│ 8650       ┆ fTBYFPiWMFCKauieEXlu ┆ -7.8765e8    ┆ UAnyfVhTUmvcjtzbCufq ┆ true     │
│ 1023       ┆ MCtTOwvJTjfbpPELcFKm ┆ -97.933431   ┆ PMEHaEOGaoJiDaomXdVX ┆ false    │
└────────────┴──────────────────────┴──────────────┴──────────────────────┴──────────┘

Classic factory class

from polypolars import PolarsFactory

class ProductFactory(PolarsFactory[Product]):
    __model__ = Product

df = ProductFactory.build_dataframe(size=100)

Convenience function

from polypolars import build_polars_dataframe

df = build_polars_dataframe(Product, size=100)

Schema inference

Schema is inferred from your type hints, so all-null columns still get the correct type:

@polars_factory
@dataclass
class User:
    id: int
    email: Optional[str]  # nullable string in Polars

df = User.build_dataframe(size=100)  # schema: id Int64, email String

From dicts

dicts = Product.build_dicts(size=1000)
# Convert to DataFrame when needed:
df = Product.create_dataframe_from_dicts(dicts)

Pydantic

from pydantic import BaseModel, Field
from polypolars import polars_factory

@polars_factory
class User(BaseModel):
    id: int = Field(gt=0)
    username: str = Field(min_length=3, max_length=20)
    email: str
    is_active: bool = True

df = User.build_dataframe(size=500)

Type mapping

Python Polars
str String
int Int64
float Float64
bool Boolean
datetime Datetime
date Date
List[T] List(T)
Dict[K,V] List(Struct(key, value))
Optional[T] T (nullable)
Tuple[T, ...] List(T)
Tuple[T, T, ...] (fixed) Array(T, n)
Dataclass / Pydantic Struct(...)

Use schema_overrides (e.g. {"col": pl.Categorical}) to override inferred types.

LazyFrame and chunked building

# LazyFrame
lf = Product.build_lazy_dataframe(size=10_000)

# Chunked building for very large size (lower memory)
df = Product.build_dataframe(size=1_000_000, chunk_size=10_000)

CLI

# Export schema
polypolars schema export myapp.models:User --output schema.txt

# Validate a file against a model
polypolars schema validate myapp.models:User data.parquet

# Generate sample data
polypolars generate myapp.models:User --size 1000 --output users.parquet --format parquet

I/O and testing

from polypolars import (
    save_as_parquet,
    load_parquet,
    load_and_validate,
    infer_schema,
    assert_dataframe_equal,
    assert_schema_equal,
)

df = User.build_dataframe(size=1000)
save_as_parquet(df, "users.parquet")

# Load and validate
schema = infer_schema(User)
df2 = load_and_validate("users.parquet", expected_schema=schema)

assert_dataframe_equal(df, df2, check_order=False)

License

MIT

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polypolars-0.1.0.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polypolars-0.1.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file polypolars-0.1.0.tar.gz.

File metadata

  • Download URL: polypolars-0.1.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for polypolars-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b25386392d8ce12c3704caf8e6fc95a0cb4c33e3440073c640964db0970d6890
MD5 93aba7ae607b08a5bf7159ac01cfcce9
BLAKE2b-256 0cdb7711205f6d4675b6a22090b4221068bb7df281e68f5149866eff36c99c93

See more details on using hashes here.

File details

Details for the file polypolars-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: polypolars-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for polypolars-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bcaa738e9d113d3c957c6b77701d2eb169551960257491fb0d1251ac486f07da
MD5 ffb52b59d11433bdfa7313152719dda8
BLAKE2b-256 2799b2f14b2807344c54ec11dad47a45016a37a190d83652b7a5d6948e806447

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page