Type-safe DataFrame library with schema validation for pandas

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gzocche

These details have not been verified by PyPI

Project description

PandasSchemaster

Type-safe DataFrame library with schema validation for pandas.

Overview

PandasSchemaster provides a strongly-typed interface to pandas DataFrames with automatic validation, type conversion, and schema-based column access. Use df[MySchema.COLUMN] instead of df['column'] for type-safe, IDE-friendly DataFrame operations that inherit all pandas DataFrame functionality.

Key Features

🛡️ Type Safety: Schema-based column access prevents runtime errors
🔧 IDE Support: Autocompletion and error detection for column names
✅ Validation: Automatic data validation based on schema definitions
🔄 Auto-casting: Seamless data type conversions
� Full DataFrame Compatibility: Inherits from pandas.DataFrame - all methods work
�📖 Self-documenting: Clear, readable code with schema column references

Quick Start

Installation

pip install pandasschemaster

Basic Usage

import pandas as pd
import numpy as np
from pandasschemaster import SchemaColumn, SchemaDataFrame, BaseSchema

# Define your schema
class SensorSchema(BaseSchema):
    TIMESTAMP = SchemaColumn("timestamp", np.datetime64, nullable=False)
    TEMPERATURE = SchemaColumn("temperature", np.float64)
    HUMIDITY = SchemaColumn("humidity", np.float64)
    SENSOR_ID = SchemaColumn("sensor_id", np.int64, nullable=False)

# Create data
data = {
    'timestamp': [pd.Timestamp.now()],
    'temperature': [23.5],
    'humidity': [45.2],
    'sensor_id': [1001]
}

# Create validated DataFrame
df = SchemaDataFrame(data, schema_class=SensorSchema, validate=True, auto_cast=True)

# Use schema columns for type-safe operations
temperature = df[SensorSchema.TEMPERATURE]  # Instead of df['temperature']
fahrenheit = df[SensorSchema.TEMPERATURE] * 9/5 + 32
hot_readings = df[df[SensorSchema.TEMPERATURE] > 25]

# Multi-column selection
subset = df[[SensorSchema.TEMPERATURE, SensorSchema.HUMIDITY]]

# Assignment with automatic type casting
df[SensorSchema.TEMPERATURE] = [24.1]

Schema Column Benefits

✅ Type-Safe Access

# Type-safe schema column access
temperature = df[SensorSchema.TEMPERATURE]

# vs traditional string access (error-prone)
temperature = df['temperature']  # Typos not caught until runtime

🔧 IDE Support

Autocompletion: SensorSchema. shows available columns
Error Detection: Invalid column names highlighted
Go-to-Definition: Jump to schema definition

🔄 Refactoring Safety

# Rename a schema column and all references update automatically
class SensorSchema(BaseSchema):
    TEMP_CELSIUS = SchemaColumn("temperature_celsius", np.float64)  # Renamed
    # All df[SensorSchema.TEMP_CELSIUS] references work immediately

🐼 Full DataFrame Compatibility

SchemaDataFrame inherits directly from pandas.DataFrame, so all DataFrame methods work seamlessly:

# Create schema-validated DataFrame
df = SchemaDataFrame(data, schema_class=SensorSchema)

# Use all pandas DataFrame methods directly
print(df.shape)  # (100, 4)
print(df.head())  # First 5 rows
summary = df.describe()  # Statistical summary
grouped = df.groupby(SensorSchema.SENSOR_ID.name).mean()

# Mathematical operations
df_scaled = df * 2
df_filtered = df[df[SensorSchema.TEMPERATURE] > 25]

# All pandas operations work while maintaining schema validation

Advanced Features

Schema Column Types and Validation

class AdvancedSchema(BaseSchema):
    # Basic column with nullable control
    PRESSURE = SchemaColumn("pressure", np.float64, nullable=False)
    
    # Column with default value
    STATUS = SchemaColumn("status", np.dtype('object'), 
                         default="UNKNOWN", nullable=True)
    
    # Column with description
    MACHINE_ID = SchemaColumn("machine_id", np.int64, 
                             description="Unique machine identifier")

Data Type Casting and Conversion

# Auto-casting handles string to numeric conversion
data = {
    'temperature': ["23.5", "24.1"],  # String values
    'sensor_id': ["1001", "1002"]     # String values  
}

df = SchemaDataFrame(data, schema_class=SensorSchema, 
                    validate=True, auto_cast=True)

# Values are automatically cast to schema types
print(df.dtypes)
# temperature    float64
# sensor_id      Int64

Real-World Example

# Industrial IoT sensor data processing
class IndustrialSchema(BaseSchema):
    TIMESTAMP = SchemaColumn("timestamp", np.datetime64, nullable=False)
    MACHINE_ID = SchemaColumn("machine_id", np.int64, nullable=False)
    TEMPERATURE = SchemaColumn("temperature", np.float64)
    PRESSURE = SchemaColumn("pressure", np.float64)
    STATUS = SchemaColumn("status", np.dtype('object'))

# Load and validate data
df = SchemaDataFrame(sensor_data, schema_class=IndustrialSchema, validate=True)

# Type-safe analysis using schema columns
avg_temp_by_machine = df.groupby(IndustrialSchema.MACHINE_ID.name)[
    IndustrialSchema.TEMPERATURE.name
].mean()

overheating = df[df[IndustrialSchema.TEMPERATURE] > 150]
efficiency = df[IndustrialSchema.PRESSURE] / df[IndustrialSchema.TEMPERATURE]

# Filter by status using schema column
running_machines = df[df[IndustrialSchema.STATUS] == 'RUNNING']

# Complex multi-column operations
subset = df.select_columns([IndustrialSchema.TEMPERATURE, IndustrialSchema.PRESSURE])

Key Features Demonstrated in Tests

Column Resolution and Access

# The library handles both string and SchemaColumn access
temp1 = df['temperature']                    # Traditional string access
temp2 = df[SensorSchema.TEMPERATURE]         # Schema column access
assert temp1.equals(temp2)                   # Both work identically

# Multi-column selection with mixed types
subset = df[[SensorSchema.TEMPERATURE, 'humidity']]  # Mixed access works

Schema Validation

# Validation catches missing required columns
class StrictSchema(BaseSchema):
    REQUIRED_COL = SchemaColumn("required", np.float64, nullable=False)

# This will raise validation errors
errors = StrictSchema.validate_dataframe(incomplete_df)
print(errors)  # ['Required column required is missing']

Mathematical Operations

# All mathematical operations work with schema columns
celsius = df[SensorSchema.TEMPERATURE]
fahrenheit = celsius * 9/5 + 32
hot_mask = celsius > 25
comfort_index = celsius + df[SensorSchema.HUMIDITY] / 10

Core Components

SchemaColumn

Defines a typed column with validation and transformation capabilities.

# Basic column definition
temp_col = SchemaColumn("temperature", np.float64, nullable=True)

# Column with all options
advanced_col = SchemaColumn(
    name="pressure",
    dtype=np.float64,
    nullable=False,
    default=0.0,
    description="Atmospheric pressure in hPa"
)

BaseSchema

Abstract base class for defining DataFrame schemas with class methods for validation.

class MySchema(BaseSchema):
    COL1 = SchemaColumn("col1", np.float64)
    COL2 = SchemaColumn("col2", np.int64)

# Get schema information
columns = MySchema.get_columns()          # Dict of column definitions
names = MySchema.get_column_names()       # List of column names
errors = MySchema.validate_dataframe(df)  # Validation error list

SchemaDataFrame

Pandas DataFrame wrapper with schema validation and type-safe column access.

# All pandas DataFrame methods work
df = SchemaDataFrame(data, schema_class=MySchema)
print(df.shape)                    # Shape
print(df.head())                   # First rows
summary = df.describe()            # Statistics
filtered = df[df['col1'] > 5]      # Filtering

# Plus schema-specific features
subset = df.select_columns([MySchema.COL1])  # Schema-based selection
print(df.schema)                             # Access to schema class

Requirements

Python 3.8+
pandas >= 2.0.0
numpy >= 1.24.0

License

MIT License. See LICENSE for details.

Contributing

Contributions welcome! Please read our contributing guidelines and submit pull requests.

Support

🐛 Issues: GitHub Issues
💡 Questions: Use GitHub Discussions

Testing

The library includes comprehensive tests covering:

Basic SchemaColumn functionality and type casting
BaseSchema validation and column management
SchemaDataFrame operations and pandas compatibility
Mathematical operations and filtering with schema columns
Column access resolution and multi-column selection

Run tests with:

python -m pytest tests/

Use df[MySchema.COLUMN] for type-safe DataFrame operations! 🚀

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gzocche

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.2

Jul 1, 2025

1.0.1

Jun 30, 2025

This version

1.0.0

Jun 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandasschemaster-1.0.0.tar.gz (19.4 kB view details)

Uploaded Jun 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pandasschemaster-1.0.0-py3-none-any.whl (9.5 kB view details)

Uploaded Jun 17, 2025 Python 3

File details

Details for the file pandasschemaster-1.0.0.tar.gz.

File metadata

Download URL: pandasschemaster-1.0.0.tar.gz
Upload date: Jun 17, 2025
Size: 19.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pandasschemaster-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`fed026207c350e76d1454e3667872f0f23a93681da7f0d99c0e710383c0a0044`
MD5	`2a6b3693524aaae379bf201f1f99ddf1`
BLAKE2b-256	`aa9890fc24f52184c9c47226e6f20d80ec7f66f1378c7ac0f62e8e5412a8afd7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pandasschemaster-1.0.0.tar.gz:

Publisher: python-publish.yml on gzocche/PandasSchemaster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pandasschemaster-1.0.0.tar.gz
- Subject digest: fed026207c350e76d1454e3667872f0f23a93681da7f0d99c0e710383c0a0044
- Sigstore transparency entry: 241566425
- Sigstore integration time: Jun 17, 2025
Source repository:
- Permalink: gzocche/PandasSchemaster@faaa04c71b66f53aeabd7832d915b0b9c3f93885
- Branch / Tag: refs/tags/v0.0.2
- Owner: https://github.com/gzocche
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@faaa04c71b66f53aeabd7832d915b0b9c3f93885
- Trigger Event: release

File details

Details for the file pandasschemaster-1.0.0-py3-none-any.whl.

File metadata

Download URL: pandasschemaster-1.0.0-py3-none-any.whl
Upload date: Jun 17, 2025
Size: 9.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pandasschemaster-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`67d2df667e26ebd3b4e272d9cd2a6c721a0753d86b7ff6bdc2e3ab1b2f85c541`
MD5	`1f591f482d726d191a467a9a2fd0d134`
BLAKE2b-256	`1175d43303ef0b12113a9c2a29e65cf5c17da7e91f20dd04602ef7e6e8944f14`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pandasschemaster-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on gzocche/PandasSchemaster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pandasschemaster-1.0.0-py3-none-any.whl
- Subject digest: 67d2df667e26ebd3b4e272d9cd2a6c721a0753d86b7ff6bdc2e3ab1b2f85c541
- Sigstore transparency entry: 241566435
- Sigstore integration time: Jun 17, 2025
Source repository:
- Permalink: gzocche/PandasSchemaster@faaa04c71b66f53aeabd7832d915b0b9c3f93885
- Branch / Tag: refs/tags/v0.0.2
- Owner: https://github.com/gzocche
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@faaa04c71b66f53aeabd7832d915b0b9c3f93885
- Trigger Event: release

pandasschemaster 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PandasSchemaster

Overview

Key Features

Quick Start

Installation

Basic Usage

Schema Column Benefits

✅ Type-Safe Access

🔧 IDE Support

🔄 Refactoring Safety

🐼 Full DataFrame Compatibility

Advanced Features

Schema Column Types and Validation

Data Type Casting and Conversion

Real-World Example

Key Features Demonstrated in Tests

Column Resolution and Access

Schema Validation

Mathematical Operations

Core Components

SchemaColumn

BaseSchema

SchemaDataFrame

Requirements

License

Contributing

Support

Testing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance