Skip to main content

A Python library for modeling queries, filters, expressions, grouping, and aggregations as object structures

Project description

Therismos

θερισμός

Greek; noun

Harvest.

A Python library for modeling queries, filters, expressions, grouping, and aggregations as object structures.

Features

  • Backend-agnostic modeling: Build expressions, filters, and aggregations independent of any specific backend
  • Declarative DSL: Natural Python syntax for building complex queries
  • Type safety: Optional field type declarations with automatic casting
  • Immutable structures: All nodes are immutable and thread-safe
  • Automatic normalization: Compound expressions are automatically flattened
  • Powerful optimizer: Detects contradictions, tautologies, and simplification opportunities
  • Visitor pattern: Extensible architecture for converting to any backend format
  • Optimization tracking: Optional tracking of all optimization transformations

Installation

pip install therismos

Or using uv:

uv pip install therismos

Expressions

Therismos provides a comprehensive expression system for modeling filters and conditions as object structures using an Abstract Syntax Tree (AST) approach.

Quick Start

from therismos import F, optimize

# Define fields
age = F("age", int)
name = F("name")
status = F("status")

# Build expressions using natural Python syntax
expr = (age > 18) & (name == "Alice") | (status == "admin")

# Optimize the expression
optimized, records = optimize(expr)

# More complex example: detect contradictions
contradiction = (age < 30) & (age > 40)
result, _ = optimize(contradiction)
# result is FALSE

# Aggregate OR equality chains
multi_status = (status == "active") | (status == "pending") | (status == "completed")
result, _ = optimize(multi_status)
# result is: status IN ("active", "pending", "completed")

Expression Types

Atomic Expressions

  • Comparisons: ==, !=, <, <=, >, >=
  • Regex matching: field.matches(pattern, flags=None)
  • Membership: field.is_in(*values) or field.is_one_of(iterable)
  • Null checking: field.is_null(), field.is_not_null()
  • Constants: TRUE, FALSE

Compound Expressions

  • AND: expr1 & expr2 or AllExpr(expr1, expr2, ...)
  • OR: expr1 | expr2 or AnyExpr(expr1, expr2, ...)
  • NOT: ~expr or NotExpr(expr)

Type Casting

Fields can declare expected types for automatic value casting:

age = F("age", int)
price = F("price", float)

# Values are automatically cast
expr = age == "42"  # value is stored as string
casted = expr.casted_value()  # returns integer 42

Custom cast functions are also supported:

def normalize_email(value):
    return str(value).strip().lower()

email = F("email", normalize_email)

Optimization

The optimizer applies various rules to simplify expressions and detect logical issues.

Basic Examples

from therismos import optimize, F, TRUE, FALSE, AllExpr, AnyExpr

age = F("age")

# Identity elimination
expr = AllExpr(age > 18, TRUE, age < 65)
result, _ = optimize(expr)
# result is: AllExpr(age > 18, age < 65)

# Contradiction detection
expr = (age == 25) & (age != 25)
result, _ = optimize(expr)
# result is: FALSE

# Tautology detection
expr = (age < 30) | (age >= 30)
result, _ = optimize(expr)
# result is: TRUE

# NOT simplification (De Morgan's laws)
expr = ~((age > 18) & (name == "Alice"))
result, _ = optimize(expr)
# result is: (age <= 18) OR (name != "Alice")

Optimization Rules Reference

The optimizer implements the following transformation rules:

Atomic Expression Simplifications
Rule Before After
Empty IN to FALSE f IN () FALSE
Single-value IN to Eq f IN (v) f == v
NOT Expression Simplifications
Rule Before After
NOT of TRUE NOT(TRUE) FALSE
NOT of FALSE NOT(FALSE) TRUE
Double negation NOT(NOT(x)) x
De Morgan's law (AND) NOT(a AND b) NOT(a) OR NOT(b)
De Morgan's law (OR) NOT(a OR b) NOT(a) AND NOT(b)
AND Expression Simplifications
Rule Before After
Empty AND AND() TRUE
Single operand AND(x) x
FALSE propagation AND(..., FALSE, ...) FALSE
TRUE elimination AND(..., TRUE, ...) AND(...) (TRUE removed)
All TRUE AND(TRUE, TRUE, ...) TRUE
Eq/Eq same value (f == v) AND (f == v) f == v
Eq/Eq different values (f == v1) AND (f == v2) FALSE
Eq/In intersection (member) (f == v) AND (f IN (v, ...)) f == v
Eq/In intersection (non-member) (f == v) AND (f IN (...)) FALSE (v not in set)
In/In intersection (empty) (f IN (v1, v2)) AND (f IN (v3, v4)) FALSE (no overlap)
In/In intersection (single) (f IN (v1, v2)) AND (f IN (v2, v3)) f == v2
In/In intersection (multiple) (f IN (v1, v2, v3)) AND (f IN (v2, v3, v4)) f IN (v2, v3)
OR Expression Simplifications
Rule Before After
Empty OR OR() FALSE
Single operand OR(x) x
TRUE propagation OR(..., TRUE, ...) TRUE
FALSE elimination OR(..., FALSE, ...) OR(...) (FALSE removed)
All FALSE OR(FALSE, FALSE, ...) FALSE
Eq/Eq union (f == v1) OR (f == v2) f IN (v1, v2)
Eq/In union (f == v) OR (f IN (v2, v3)) f IN (v, v2, v3)
In/In union (f IN (v1, v2)) OR (f IN (v3, v4)) f IN (v1, v2, v3, v4)
Contradiction Detection (AND)
Pattern Result
(f == v) AND (f != v) FALSE
f.is_null() AND f.is_not_null() FALSE
(f < a) AND (f > b) where b >= a FALSE
(f <= a) AND (f > a) FALSE
(f >= b) AND (f < b) FALSE
Tautology Detection (OR)
Pattern Result
(f == v) OR (f != v) TRUE
f.is_null() OR f.is_not_null() TRUE
(f < v) OR (f >= v) TRUE
(f <= v) OR (f > v) TRUE

Complex Real-World Example: Detecting Accidental Contradictions

The optimizer is particularly valuable for catching accidentally contradictory conditions in complex business logic. Here's a realistic scenario where multiple nested requirements create an impossible condition:

from therismos import F, optimize, FALSE

# Define fields
user_age = F("age", int)
user_role = F("role")
user_status = F("status")
account_tier = F("account_tier")
dept = F("department")
experience = F("experience_years", int)
available = F("available", bool)

# Complex filter built incrementally by different team members
# Each level seemed reasonable in isolation, but together they create a contradiction
complex_filter = (
    (
        # Level 1: Nested OR conditions for base eligibility
        (
            (
                # Premium account holders
                (account_tier == "premium") &
                (
                    (user_role == "developer") |
                    (user_role == "designer")
                )
            ) |
            (
                # OR enterprise users with experience
                (account_tier == "enterprise") &
                (experience >= 5) &
                (dept.is_in("engineering", "design"))
            )
        ) &
        # Level 2: Status and department requirements with nesting
        (
            (
                (user_status == "active") &
                (
                    # Nested department-specific conditions
                    (
                        (dept == "engineering") &
                        (experience >= 2)
                    ) |
                    (
                        (dept == "design") &
                        (user_role.is_in("designer", "lead_designer"))
                    )
                )
            ) |
            # OR admin override
            (user_role == "admin")
        ) &
        # Level 3: First age requirement
        (user_age >= 25) &
        # Level 4: Second age requirement nested with other conditions
        (
            (user_age <= 50) &
            (
                # More nesting for additional validation
                (account_tier.is_in("premium", "enterprise", "trial")) |
                (user_role == "admin")
            )
        )
    ) & (
        # Level 5: Someone later added "additional validation"
        # without realizing it contradicts the previous age requirements!
        (user_age < 25) &  # Must be under 25
        (user_age > 50)    # AND must be over 50 (impossible!)
    ) &
    (available == True)
)

# The contradiction occurs because:
# - Earlier levels require: 25 <= age <= 50
# - Final level requires: age < 25 AND age > 50
# - These conditions cannot both be true!

result, records = optimize(complex_filter)

print(f"Optimized result: {result}")
# Output: FalseExpr()

print(f"Is FALSE: {result is FALSE}")
# Output: True

print(f"Optimization steps that revealed the contradiction:")
for i, record in enumerate(records, 1):
    print(f"Step {i}: {record.reason}")
    if "Contradiction" in record.reason:
        print(f"  *** This step detected the contradiction! ***")

# Example output:
# Step 1: OR equality chain aggregation to IN
# Step 2: Optimize children in AND
# Step 3: Optimize children in OR
# Step 4: Optimize children in AND
# Step 5: Contradiction detected in AND
#   *** This step detected the contradiction! ***

# By examining the 'before' expression in the contradiction record,
# you can identify exactly which requirements conflict with each other
# and trace back through your business logic to find the source.

The optimizer's tracking feature is invaluable for debugging complex business rules, especially when:

  • Multiple developers contribute conditions to the same filter over time
  • Requirements evolve and accidentally introduce conflicts
  • Combining filters from different parts of the application
  • Migrating or refactoring legacy filtering logic
  • Building user-facing query builders where users can create invalid combinations

Optimization Tracking

Track optimization changes:

result, records = optimize(expr)
for record in records:
    print(f"Applied: {record.reason}")
    print(f"Before: {record.before}")
    print(f"After: {record.after}")

You can also use a collecting parameter to accumulate records across multiple optimizations:

my_records = []
result1, _ = optimize(expr1, my_records)
result2, _ = optimize(expr2, my_records)
# my_records now contains all optimization steps from both calls

Expression Evaluation

Expressions can be evaluated against actual data to determine if the data satisfies the filter criteria. This is useful for:

  • In-memory filtering when a database query is not needed
  • Testing and validating filter logic
  • Client-side filtering before sending data to a backend
  • Data validation and access control checks

Basic Evaluation

The eval() method evaluates an expression against a dictionary of field values:

from therismos import F

age = F("age")
status = F("status")

# Build an expression
expr = (age > 18) & (status == "active")

# Evaluate against data
data = {"age": 25, "status": "active"}
result = expr.eval(data)  # Returns True

data = {"age": 15, "status": "active"}
result = expr.eval(data)  # Returns False

Evaluation with Type Casting

When fields have declared types, values are automatically cast during evaluation:

age = F("age", int)
expr = age >= 18

# String values are automatically cast to int
result = expr.eval({"age": "25"})  # Returns True (string "25" cast to int 25)

# This will raise TypeError or ValueError if casting fails
try:
    expr.eval({"age": "not_a_number"})
except (TypeError, ValueError):
    print("Invalid age value")

Evaluating Membership and Regex

All expression types support evaluation:

import re

# IN expressions
status = F("status")
expr = status.is_in("active", "pending", "approved")
result = expr.eval({"status": "active"})  # Returns True

# Regex matching
email = F("email")
expr = email.matches(r".*@example\.com$", re.IGNORECASE)
result = expr.eval({"email": "user@example.com"})  # Returns True

# Null checking
phone = F("phone")
expr = phone.is_null()
result = expr.eval({"phone": None})  # Returns True

Complex Evaluation Examples

Compound expressions evaluate all nested conditions:

age = F("age", int)
country = F("country")
verified = F("verified")
subscription = F("subscription")

# Complex eligibility check
expr = (
    (age >= 18) &
    (country.is_one_of(["US", "UK", "CA"])) &
    ((verified == True) | (subscription.is_in("premium", "enterprise")))
)

# Adult in allowed country with verification
result = expr.eval({
    "age": 25,
    "country": "US",
    "verified": True,
    "subscription": "free"
})  # Returns True

# Adult in allowed country with premium subscription (unverified)
result = expr.eval({
    "age": 30,
    "country": "UK",
    "verified": False,
    "subscription": "premium"
})  # Returns True

# Minor (fails age requirement)
result = expr.eval({
    "age": 16,
    "country": "US",
    "verified": True,
    "subscription": "premium"
})  # Returns False

Evaluation with Optimized Expressions

You can optimize expressions before evaluation for better performance or to catch logical issues:

age = F("age", int)
status = F("status")

# Build a complex expression
expr = (
    ((age > 18) | (age > 25)) &  # Redundant condition
    (status == "active") &
    ((age < 30) | (age >= 30))   # Tautology
)

# Optimize first
optimized, _ = optimize(expr)
# optimized is simplified to: (age > 18) AND TRUE AND (status == "active")
# which further simplifies to: (age > 18) AND (status == "active")

# Then evaluate the optimized expression
result = optimized.eval({"age": 25, "status": "active"})  # Returns True

Error Handling

Evaluation raises exceptions for invalid data:

age = F("age")
expr = age > 18

# Missing field raises KeyError
try:
    expr.eval({"name": "Alice"})  # age field is missing
except KeyError:
    print("Required field 'age' not found")

# Invalid type casting raises TypeError or ValueError
age_typed = F("age", int)
expr = age_typed > 18
try:
    expr.eval({"age": "not_a_number"})
except (TypeError, ValueError):
    print("Cannot cast value to required type")

Converting to other formats

Therismos uses the visitor pattern to enable extensible conversions of expressions to any format. You can implement custom visitors or use the built-in ones.

Custom Visitors

Implement custom visitors to convert expressions to any format:

from therismos import ExprVisitor

class SQLVisitor:
    def visit_eq(self, expr):
        return f"{expr.field.name} = ?"

    def visit_all(self, expr):
        parts = [e.accept(self) for e in expr.exprs]
        return " AND ".join(parts)

    # ... implement other visit methods

visitor = SQLVisitor()
sql = expr.accept(visitor)

Built-in Visitors

Therismos provides several built-in visitors for common use cases:

StringVisitor

Converts expressions to human-readable string representation:

from therismos import F, StringVisitor

age = F("age")
name = F("name")
expr = (age > 18) & (name == "Alice")

visitor = StringVisitor()
result = expr.accept(visitor)
# Output: "(age > 18 AND name = 'Alice')"
CountVisitor

Counts the number of nodes in an expression tree:

from therismos import F, CountVisitor

age = F("age")
name = F("name")
expr = (age > 18) & (name == "Alice")

visitor = CountVisitor()
count = expr.accept(visitor)
# Output: 3 (1 AllExpr + 2 atomic expressions)
DictVisitor

Converts expressions to dictionary representation for serialization:

from therismos import F, DictVisitor

age = F("age")
expr = age > 18

visitor = DictVisitor()
result = expr.accept(visitor)
# Output: {"type": "gt", "field": "age", "value": 18}

For compound expressions, the dictionary is nested:

age = F("age")
name = F("name")
expr = (age > 18) & (name == "Alice")

visitor = DictVisitor()
result = expr.accept(visitor)
# Output: {
#     "type": "and",
#     "exprs": [
#         {"type": "gt", "field": "age", "value": 18},
#         {"type": "eq", "field": "name", "value": "Alice"}
#     ]
# }
FieldGathererVisitor

Collects all unique field names used in an expression tree:

from therismos import F, FieldGathererVisitor

age = F("age")
name = F("name")
status = F("status")
expr = (age > 18) & (name == "Alice") | (status == "active")

visitor = FieldGathererVisitor()
expr.accept(visitor)
field_names = visitor.field_names
# Output: {"age", "name", "status"}

This is useful for:

  • Analyzing which fields are used in complex filters
  • Validating that all referenced fields exist in your schema
  • Generating documentation or metadata about queries
  • Determining required permissions for a query

Backend Converters

MongoVisitor

The MongoVisitor converts therismos expressions to MongoDB query filters compatible with PyMongo and Motor.

Installation:

# For synchronous PyMongo
uv pip install therismos[mongodb]

# For asynchronous Motor
uv pip install therismos[mongodb-async]

Basic Usage:

from therismos import F, optimize
from therismos.expr.visitors.mongo import MongoVisitor

age = F("age")
status = F("status")
country = F("country")

# Build and optimize expression
expr = (age >= 21) & (status == "active") & (country.is_in("US", "UK", "CA"))
optimized, _ = optimize(expr)

# Convert to MongoDB filter
visitor = MongoVisitor()
mongo_filter = optimized.accept(visitor)

# Result: {
#     "age": {"$gte": 21},
#     "status": "active",
#     "country": {"$in": ["US", "UK", "CA"]}
# }

Using with PyMongo:

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["mydb"]
collection = db["users"]

# Use the generated filter
results = collection.find(mongo_filter)
for doc in results:
    print(doc)

Using with Motor (async):

import asyncio
from motor.motor_asyncio import AsyncIOMotorClient

async def find_users():
    client = AsyncIOMotorClient("mongodb://localhost:27017/")
    db = client["mydb"]
    collection = db["users"]

    # Use the generated filter
    cursor = collection.find(mongo_filter)
    results = await cursor.to_list(length=100)
    return results

asyncio.run(find_users())

Advanced Features:

The MongoVisitor handles all therismos expression types:

import re
from therismos import F, TRUE, FALSE
from therismos.expr.visitors.mongo import MongoVisitor

email = F("email")
age = F("age")
name = F("name")
status = F("status")

visitor = MongoVisitor()

# Regex matching (with case-insensitive flag)
expr = email.matches(r".*@example\.com$", re.IGNORECASE)
mongo_filter = expr.accept(visitor)
# Result: {"email": {"$regex": ".*@example\\.com$", "$options": "i"}}

# Range queries
expr = (age >= 18) & (age <= 65)
mongo_filter = expr.accept(visitor)
# Result: {"age": {"$gte": 18, "$lte": 65}} (optimized)

# Complex OR conditions
expr = (status == "active") | (status == "pending") | (status == "approved")
optimized_expr, _ = optimize(expr)  # Converts to IN
mongo_filter = optimized_expr.accept(visitor)
# Result: {"status": {"$in": ["active", "pending", "approved"]}}

# Null checking
expr = name.is_not_null()
mongo_filter = expr.accept(visitor)
# Result: {"name": {"$ne": null}}

# NOT expressions
expr = ~(age < 18)
mongo_filter = expr.accept(visitor)
# Result: {"$nor": [{"age": {"$lt": 18}}]}

# Constants
true_filter = TRUE.accept(visitor)   # Result: {}
false_filter = FALSE.accept(visitor)  # Result: {"$expr": false}

Optimization Options:

# By default, simple AND expressions are optimized by merging fields
visitor = MongoVisitor(optimize_simple_and=True)
expr = (age > 18) & (name == "Alice")
mongo_filter = expr.accept(visitor)
# Result: {"age": {"$gt": 18}, "name": "Alice"}

# Disable optimization to always use $and
visitor = MongoVisitor(optimize_simple_and=False)
mongo_filter = expr.accept(visitor)
# Result: {"$and": [{"age": {"$gt": 18}}, {"name": "Alice"}]}

Type Casting:

The MongoVisitor respects field type declarations and automatically casts values:

age = F("age", int)
expr = age.is_in(18, 21, 25)  # Values will be cast to int

visitor = MongoVisitor()
mongo_filter = expr.accept(visitor)
# Result: {"age": {"$in": [18, 21, 25]}}

Module Structure

Therismos is organized into the following modules and submodules:

therismos/
├── __init__.py              # Main package exports
└── expr/                    # Expression module
    ├── __init__.py          # Expression module exports
    ├── _expr.py             # Core expression classes (Expr, Field, operators, etc.)
    ├── optimizer.py         # Expression optimization and simplification
    └── visitors/            # Visitor implementations package
        ├── __init__.py      # Core visitor exports
        ├── _visitors.py     # Built-in visitor implementations
        └── mongo.py         # MongoDB query filter converter

Core Modules

  • therismos.expr: Core expression AST implementation

    • Expression types: Eq, Ne, Lt, Le, Gt, Ge, Regex, In, IsNull
    • Compound expressions: AllExpr, AnyExpr, NotExpr
    • Logical constants: TRUE, FALSE
    • Field types: Field, F (helper function)
    • Visitor protocol: ExprVisitor
  • therismos.expr.optimizer: Expression optimization

    • optimize(expr, records=None): Optimize an expression tree
    • OptimizationRecord: Records of optimization transformations
  • therismos.expr.visitors: Built-in visitor implementations

    • StringVisitor: Converts expressions to human-readable strings
    • CountVisitor: Counts nodes in expression trees
    • DictVisitor: Converts expressions to dictionary representation
    • FieldGathererVisitor: Collects all field names used in an expression
  • therismos.expr.visitors.mongo: MongoDB backend converter

    • MongoVisitor: Converts expressions to MongoDB query filters for PyMongo/Motor

Development

Requires Python 3.11 or higher.

Setup

# Install dependencies
uv pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check therismos tests

# Run type checking
mypy therismos

# Run all checks with tox
tox

Testing

The project uses pytest with extensive parametrization for comprehensive test coverage:

# Run all tests
pytest

# Run with coverage
pytest --cov=therismos --cov-report=html

# Run specific test file
pytest tests/test_optimizer.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

therismos-0.1.0.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

therismos-0.1.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file therismos-0.1.0.tar.gz.

File metadata

  • Download URL: therismos-0.1.0.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for therismos-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b4481e3910164e0da3478b45cef9b291594dc8736100d05282b3435140a93a86
MD5 a9ccbab9e41612e8de23e92bf68b1d45
BLAKE2b-256 cd08c217d69caef86f881254be5343faf5bd5193fdb7bba586895f9844b93747

See more details on using hashes here.

File details

Details for the file therismos-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: therismos-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for therismos-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 416fa35d95bb3f6429fe44303533bea8b884502eef7bbe76c6ced0bd352b87fb
MD5 1f21a220dd83bb7df4f9760674faa0b9
BLAKE2b-256 b827a52381a225e29d66f811b4a3352915f051da36d787bcd7d023615ddbf617

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page