Skip to main content

A lightweight expression evaluation engine for pandas DataFrames

Project description

df-eval

df-eval icon

CI PyPI Documentation Python 3.11+ License: MIT Code style: black

A lightweight expression evaluation engine for pandas DataFrames, supporting schema-driven derived columns and external lookups.

Overview

df-eval is a Python library that provides a flexible and efficient way to evaluate expressions on pandas DataFrames. It's designed for scenarios where you need to:

  • Apply complex transformations to DataFrames using string expressions
  • Define schemas of derived columns that depend on existing columns
  • Register custom functions (UDFs) and constants for use in expressions
  • Use safe, allow-listed functions (abs, log, exp, sqrt, clip, where, isna, fillna)
  • Handle dependencies between derived columns with automatic topological ordering
  • Perform lookups from external data sources (files, databases, HTTP APIs)
  • Track provenance of derived columns
  • Maintain clean, readable code for data transformations

Features

  • Safe Expression Evaluation: Allow-listed vectorized functions for secure evaluation
  • UDF and Constant Registry: Register custom functions and constants
  • Schema-Driven Columns: Define multiple derived columns with automatic dependency resolution
  • Topological Ordering: Automatically resolve dependencies between columns
  • Cycle Detection: Detect and report circular dependencies
  • Dtype Casting: Specify output types for derived columns
  • Provenance Tracking: Track the origin and dependencies of derived columns
  • Lookup Functionality: Resolve values from external sources with caching
  • Type-Safe: Built with Python 3.11+ type hints
  • Well-Tested: Comprehensive test suite with 95%+ coverage
  • Well-Documented: Full documentation with Sphinx
  • Backend Seam: Designed for future Arrow/Polars support

Installation

pip install df-eval

For development:

git clone https://github.com/elphick/df-eval.git
cd df-eval
uv sync

Quick Start

Basic Expression Evaluation

import pandas as pd
from df_eval import Engine

# Create a DataFrame
df = pd.DataFrame({
    "a": [1, 2, 3],
    "b": [4, 5, 6]
})

# Create an engine
engine = Engine()

# Evaluate an expression
result = engine.evaluate(df, "a + b")
print(result)  # [5, 7, 9]

Schema-Driven Derived Columns

# Define a schema with dependent columns
schema = {
    "sum": "a + b",
    "product": "a * b",
    "ratio": "a / b",
    "ratio_2dp": "round(a / b, 2)",
    "ratio_bucket": "floor((a / b) * 10)"
}

df_with_derived = engine.apply_schema(df, schema)
print(df_with_derived)

You can also use a mapping spec for richer per-column options, including decimals (rounding) and alias (rename from an incoming source column):

schema = {
    "price_2dp": {"expr": "price", "decimals": 2},
    "price": {"alias": "legacy_price"},
}

rounded = engine.apply_schema(df, schema)

Using Allow-Listed Safe Functions

# Use safe, allow-listed functions
schema = {
    "abs_a": "abs(a)",
    "log_b": "log(b)",
    "sqrt_sum": "sqrt(a + b)",
    "clipped": "clip(a, 0, 2)"
}

result = engine.apply_schema(df, schema)

Register Custom Functions (UDFs)

# Register a custom function
def custom_transform(x):
    return x ** 2 + 10

engine.register_function("transform", custom_transform)

# Use it in expressions
result = engine.evaluate(df, "transform(a)")

Built-in Functions

The library provides several allow-listed safe functions:

  • abs(x): Absolute value
  • log(x): Natural logarithm (handles negative values safely)
  • exp(x): Exponential function (handles overflow safely)
  • sqrt(x): Square root (handles negative values safely)
  • round(x, decimals=0): Round to a fixed number of decimal places
  • ceil(x): Ceiling value
  • floor(x): Floor value
  • clip(x, min, max): Clip values to a range
  • where(condition, x, y): Conditional selection
  • isna(x): Check for NaN/None values
  • fillna(x, value): Fill NaN/None with a value
  • safe_divide(a, b): Division with NaN for divide-by-zero
  • coalesce(*args): Return first non-null value

Documentation

For comprehensive documentation including advanced usage, API reference, and more examples, visit the full documentation.

Requirements

  • Python 3.11 or higher
  • pandas >= 2.0.0
  • numpy >= 1.26.0

Development

Running Tests

uv run pytest

Building Documentation

cd docs
uv run sphinx-build -b html . _build/html

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

df_eval-0.2.1.tar.gz (161.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

df_eval-0.2.1-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file df_eval-0.2.1.tar.gz.

File metadata

  • Download URL: df_eval-0.2.1.tar.gz
  • Upload date:
  • Size: 161.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for df_eval-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8cf44eeb463ae0b676b042379c41eb1a4ffb1a9cdc276efa47aa77f4860f6b1a
MD5 5b00d4e32145adf8aec52567e3897367
BLAKE2b-256 be1a333a1a6076b1096b07ca4cf485dab9b59f487fe6091c58522d3d499d437c

See more details on using hashes here.

Provenance

The following attestation bundles were made for df_eval-0.2.1.tar.gz:

Publisher: publish_to_pypi.yml on elphick/df-eval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file df_eval-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: df_eval-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for df_eval-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0e03be5e3dad20a04a08bdd0213a2bee09db16ab68f8573776e0ee877182c6ee
MD5 0ea6defc701bca79a4b79fc1404f6a7d
BLAKE2b-256 071dbd7a419e1c3b7f1b8ca26a6fd82c5cb3e4802609e23feecb1fe70ac019bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for df_eval-0.2.1-py3-none-any.whl:

Publisher: publish_to_pypi.yml on elphick/df-eval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page