A lightweight expression evaluation engine for pandas DataFrames
Project description
df-eval
A lightweight expression evaluation engine for pandas DataFrames, supporting schema-driven derived columns and external lookups.
Overview
df-eval is a Python library that provides a flexible and efficient way to evaluate expressions on pandas DataFrames. It's designed for scenarios where you need to:
- Apply complex transformations to DataFrames using string expressions
- Define schemas of derived columns that depend on existing columns
- Register custom functions (UDFs) and constants for use in expressions
- Use safe, allow-listed functions (abs, log, exp, sqrt, clip, where, isna, fillna)
- Handle dependencies between derived columns with automatic topological ordering
- Perform lookups from external data sources (files, databases, HTTP APIs)
- Track provenance of derived columns
- Maintain clean, readable code for data transformations
Features
- Safe Expression Evaluation: Allow-listed vectorized functions for secure evaluation
- UDF and Constant Registry: Register custom functions and constants
- Schema-Driven Columns: Define multiple derived columns with automatic dependency resolution
- Topological Ordering: Automatically resolve dependencies between columns
- Cycle Detection: Detect and report circular dependencies
- Dtype Casting: Specify output types for derived columns
- Provenance Tracking: Track the origin and dependencies of derived columns
- Lookup Functionality: Resolve values from external sources with caching
- Type-Safe: Built with Python 3.11+ type hints
- Well-Tested: Comprehensive test suite with 95%+ coverage
- Well-Documented: Full documentation with Sphinx
- Backend Seam: Designed for future Arrow/Polars support
Installation
pip install df-eval
For development:
git clone https://github.com/elphick/df-eval.git
cd df-eval
uv sync
Quick Start
Basic Expression Evaluation
import pandas as pd
from df_eval import Engine
# Create a DataFrame
df = pd.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6]
})
# Create an engine
engine = Engine()
# Evaluate an expression
result = engine.evaluate(df, "a + b")
print(result) # [5, 7, 9]
Schema-Driven Derived Columns
# Define a schema with dependent columns
schema = {
"sum": "a + b",
"product": "a * b",
"ratio": "a / b"
}
df_with_derived = engine.apply_schema(df, schema)
print(df_with_derived)
Using Allow-Listed Safe Functions
# Use safe, allow-listed functions
schema = {
"abs_a": "abs(a)",
"log_b": "log(b)",
"sqrt_sum": "sqrt(a + b)",
"clipped": "clip(a, 0, 2)"
}
result = engine.apply_schema(df, schema)
Register Custom Functions (UDFs)
# Register a custom function
def custom_transform(x):
return x ** 2 + 10
engine.register_function("transform", custom_transform)
# Use it in expressions
result = engine.evaluate(df, "transform(a)")
Built-in Functions
The library provides several allow-listed safe functions:
abs(x): Absolute valuelog(x): Natural logarithm (handles negative values safely)exp(x): Exponential function (handles overflow safely)sqrt(x): Square root (handles negative values safely)clip(x, min, max): Clip values to a rangewhere(condition, x, y): Conditional selectionisna(x): Check for NaN/None valuesfillna(x, value): Fill NaN/None with a valuesafe_divide(a, b): Division with NaN for divide-by-zerocoalesce(*args): Return first non-null value
Documentation
For comprehensive documentation including advanced usage, API reference, and more examples, visit the full documentation.
Requirements
- Python 3.11 or higher
- pandas >= 2.0.0
- numpy >= 1.26.0
Development
Running Tests
uv run pytest
Building Documentation
cd docs
uv run sphinx-build -b html . _build/html
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file df_eval-0.1.2.tar.gz.
File metadata
- Download URL: df_eval-0.1.2.tar.gz
- Upload date:
- Size: 184.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9c6c2e3ca14339f1ca12e2e85ccbcd6f60ed0762661c96018fc4a2c54e8026e
|
|
| MD5 |
0e716b3ae61c4ef10273220dadaef15e
|
|
| BLAKE2b-256 |
2633c1228fae0d0b93e0f2387efe3d30c67f38c8fdeec99cb5beacfa62fc9b17
|
Provenance
The following attestation bundles were made for df_eval-0.1.2.tar.gz:
Publisher:
publish_to_pypi.yml on elphick/df-eval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
df_eval-0.1.2.tar.gz -
Subject digest:
f9c6c2e3ca14339f1ca12e2e85ccbcd6f60ed0762661c96018fc4a2c54e8026e - Sigstore transparency entry: 1293599366
- Sigstore integration time:
-
Permalink:
elphick/df-eval@066faacd992f0806af6834f20250645248ee3c31 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/elphick
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_to_pypi.yml@066faacd992f0806af6834f20250645248ee3c31 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file df_eval-0.1.2-py3-none-any.whl.
File metadata
- Download URL: df_eval-0.1.2-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c1fc46045017c7d488a391c366ff56e945098635fa54eaad500dd24c25eff90
|
|
| MD5 |
cb2370c618d2ab440ea9084db6521c9e
|
|
| BLAKE2b-256 |
28f1c2a54511f5777a3b37ec1f109fcdaa8a501e25e4b5dea26f16e1b0a89ad3
|
Provenance
The following attestation bundles were made for df_eval-0.1.2-py3-none-any.whl:
Publisher:
publish_to_pypi.yml on elphick/df-eval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
df_eval-0.1.2-py3-none-any.whl -
Subject digest:
9c1fc46045017c7d488a391c366ff56e945098635fa54eaad500dd24c25eff90 - Sigstore transparency entry: 1293599377
- Sigstore integration time:
-
Permalink:
elphick/df-eval@066faacd992f0806af6834f20250645248ee3c31 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/elphick
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_to_pypi.yml@066faacd992f0806af6834f20250645248ee3c31 -
Trigger Event:
workflow_dispatch
-
Statement type: