Generate pandas DataFrames using polyfactory for testing and development

These details have not been verified by PyPI

Project links

Project description

Polypandas

Generate type-safe pandas DataFrames effortlessly using polyfactory.

Why Polypandas?

Creating test data for pandas applications is tedious. Polypandas makes it effortless by generating realistic test DataFrames from your Python data models, with automatic schema inference so columns get the right dtypes even when values are null.

from dataclasses import dataclass
from polypandas import pandas_factory

@pandas_factory
@dataclass
class User:
    id: int
    name: str
    email: str

# Generate 1000 rows instantly
df = User.build_dataframe(size=1000)

Installation

Base install (pandas + polyfactory):

pip install polypandas

Optional: PyArrow for proper nested struct columns (otherwise nested fields are object columns of dicts):

pip install polypandas[pyarrow]

Development (tests, lint, type-checking):

pip install "polypandas[dev]"

Requirements: Python 3.8+, pandas ≥1.3, polyfactory ≥2.0.

Quick start

Decorator (recommended)

from dataclasses import dataclass
from typing import Optional
from polypandas import pandas_factory

@pandas_factory
@dataclass
class Product:
    product_id: int
    name: str
    price: float
    description: Optional[str] = None
    in_stock: bool = True

df = Product.build_dataframe(size=100)
print(df.head())

Generate dicts, then convert to DataFrame

dicts = Product.build_dicts(size=1000)
df = Product.create_dataframe_from_dicts(dicts)

Classic factory pattern

from polypandas import PandasFactory

class ProductFactory(PandasFactory[Product]):
    __model__ = Product

df = ProductFactory.build_dataframe(size=100)

Convenience function (no factory class)

from polypandas import build_pandas_dataframe

df = build_pandas_dataframe(Product, size=100)

Pydantic models

from pydantic import BaseModel
from polypandas import pandas_factory

@pandas_factory
class Order(BaseModel):
    order_id: int
    customer_id: int
    total: float

df = Order.build_dataframe(size=500)

Nested structs (optional PyArrow)

With pip install polypandas[pyarrow], nested dataclasses become proper struct columns (PyArrow-backed). Without PyArrow they are object columns of dicts.

from dataclasses import dataclass
from polypandas import pandas_factory

@dataclass
class Address:
    street: str
    city: str
    zipcode: str

@pandas_factory
@dataclass
class Person:
    id: int
    name: str
    address: Address

# Auto: use PyArrow when available and model has nested structs
df = Person.build_dataframe(size=50)

# Force PyArrow (when installed)
df = Person.build_dataframe(size=50, use_pyarrow=True)

# Force standard path (nested column = object of dicts)
df = Person.build_dataframe(size=50, use_pyarrow=False)

Helpers:

has_nested_structs(Model) — True if the model has any nested struct or list-of-struct field.
infer_pyarrow_schema(Model) — Returns a pyarrow.Schema when PyArrow is installed, else None.
is_pyarrow_available() — Runtime check for PyArrow.

Key features

Factory pattern — Uses polyfactory for data generation.
Type-safe schema — Python types become pandas dtypes automatically.
Robust null handling — Schema from types avoids dtype issues with all-null columns.
Nested structs — Optional PyArrow support for proper struct columns; otherwise object columns of dicts.
Complex types — Lists and dicts as object columns; nested models as structs (with PyArrow) or dicts.
Flexible models — Dataclasses, Pydantic v2 models, TypedDicts.
Testing utilities — assert_dataframe_equal, assert_schema_equal, assert_column_exists, and more.
Data I/O — Save/load Parquet, JSON, CSV; JSON lines for dicts.

Type mapping

Python type	Pandas dtype
`str`	`object`
`int`	`int64`
`float`	`float64`
`bool`	`bool`
`datetime`	`datetime64[ns]`
`date`	`datetime64[ns]`
`Optional[T]`	same as `T`
`List[T]`	`object`
`Dict[K, V]`	`object`
Nested model	`object` or PyArrow struct (with `[pyarrow]`)

API reference

Factory

API	Description
`@pandas_factory`	Decorator: adds `build_dataframe`, `build_dicts`, `create_dataframe_from_dicts` to the model.
`PandasFactory[Model]`	Base factory class; set `__model__ = Model`.
`build_dataframe(size=10, schema=None, use_pyarrow=None, **kwargs)`	Build a pandas DataFrame.
`build_dicts(size=10, **kwargs)`	Build a list of dicts (no DataFrame).
`create_dataframe_from_dicts(data, schema=None)`	Turn a list of dicts into a DataFrame.
`build_pandas_dataframe(model, size=10, schema=None, use_pyarrow=None, **kwargs)`	One-off build without a factory class.

Schema

API	Description
`infer_schema(model, schema=None)`	Infer a dict of column name → pandas dtype.
`python_type_to_pandas_dtype(python_type)`	Map a Python type to a pandas dtype string.
`has_nested_structs(model)`	Whether the model has nested struct/list-of-struct fields.
`infer_pyarrow_schema(model)`	PyArrow schema for the model, or `None` if PyArrow not installed.

Runtime

API	Description
`is_pandas_available()`	Whether pandas can be imported.
`is_pyarrow_available()`	Whether PyArrow can be imported.

Testing

API	Description
`assert_dataframe_equal(df1, df2, ...)`	Compare DataFrames (optional order, dtypes, tolerances).
`assert_schema_equal(df1, df2, ...)`	Compare column dtypes.
`assert_dtypes_equal(df1, df2, ...)`	Alias for schema/dtype comparison.
`assert_approx_count(df, expected_count, tolerance=0.1)`	Assert row count within tolerance.
`assert_column_exists(df, *columns)`	Assert columns exist.
`assert_no_duplicates(df, columns=None)`	Assert no duplicate rows.
`get_column_stats(df, column)`	Basic stats (count, nulls, distinct, min/max/mean for numeric).

I/O

API	Description
`save_as_parquet(df, path, **kwargs)`	Save DataFrame as Parquet.
`save_as_json(df, path, **kwargs)`	Save as JSON.
`save_as_csv(df, path, header=True, **kwargs)`	Save as CSV.
`load_parquet(path, **kwargs)`	Load Parquet into DataFrame.
`load_json(path, **kwargs)`	Load JSON.
`load_csv(path, **kwargs)`	Load CSV.
`load_and_validate(path, expected_schema=None, ...)`	Load and optionally validate columns/dtypes.
`save_dicts_as_json(data, path)`	Save list of dicts as JSON lines.
`load_dicts_from_json(path)`	Load JSON lines into list of dicts.

Exceptions

PolypandasError — base
PandasNotAvailableError — pandas required but not installed
SchemaInferenceError — schema cannot be inferred
UnsupportedTypeError — type has no pandas/PyArrow mapping
DataIOError — I/O failure
DataFrameComparisonError — assertion failure in testing helpers

Testing utilities

from polypandas import (
    assert_dataframe_equal,
    assert_schema_equal,
    assert_approx_count,
    assert_column_exists,
    assert_no_duplicates,
    get_column_stats,
)

assert_dataframe_equal(df1, df2, check_order=False, rtol=1e-5)
assert_schema_equal(df1, df2)
assert_column_exists(df, "user_id", "name", "email")
assert_no_duplicates(df, columns=["user_id"])
stats = get_column_stats(df, "amount")

Data I/O

DataFrames:

from polypandas import (
    save_as_parquet,
    save_as_json,
    save_as_csv,
    load_parquet,
    load_json,
    load_csv,
    load_and_validate,
    infer_schema,
)

save_as_parquet(df, "users.parquet")
save_as_csv(df, "users.csv", header=True)

df = load_parquet("users.parquet")
df = load_and_validate("users.parquet", expected_schema=infer_schema(User))

JSON lines (list of dicts):

from polypandas import save_dicts_as_json, load_dicts_from_json

dicts = User.build_dicts(size=100)
save_dicts_as_json(dicts, "users.jsonl")
loaded = load_dicts_from_json("users.jsonl")

License & related

License: MIT — see LICENSE.
Docs: docs/roadmap.md for roadmap and ideas.
Related: polyspark, polyfactory.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polypandas-0.1.0.tar.gz (68.6 kB view details)

Uploaded Feb 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

polypandas-0.1.0-py3-none-any.whl (15.0 kB view details)

Uploaded Feb 24, 2026 Python 3

File details

Details for the file polypandas-0.1.0.tar.gz.

File metadata

Download URL: polypandas-0.1.0.tar.gz
Upload date: Feb 24, 2026
Size: 68.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for polypandas-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9ffe748ec3ea38899127b961ad216fb1df6753384c98f09cbb6d6db0fdd55b05`
MD5	`ac6dac271f8f106551735bbe408022d4`
BLAKE2b-256	`40215e2003a5ad5095ec32c174c17d6eb36b89d83e9fea36a1f629a19188546f`

See more details on using hashes here.

File details

Details for the file polypandas-0.1.0-py3-none-any.whl.

File metadata

Download URL: polypandas-0.1.0-py3-none-any.whl
Upload date: Feb 24, 2026
Size: 15.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for polypandas-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b16fd0119d5379673af6b47a83a7c1838d79409940ac8f6b3477da7e08421ef`
MD5	`c8de30b1b260fb37969aa15108d265a4`
BLAKE2b-256	`fced92660f0bec86f7993891cd8c6fe038318cf98d2bd11389c1c8a3aa171bb4`

See more details on using hashes here.

polypandas 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Polypandas

Why Polypandas?

Installation

Quick start

Decorator (recommended)

Generate dicts, then convert to DataFrame

Classic factory pattern

Convenience function (no factory class)

Pydantic models

Nested structs (optional PyArrow)

Key features

Type mapping

API reference

Factory

Schema

Runtime

Testing

I/O

Exceptions

Testing utilities

Data I/O

License & related

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes