A Python library for modeling queries, filters, expressions, grouping, and aggregations as object structures

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Libraries
Typing
- Typed

Project description

Therismos

θερισμός

Greek; noun

Harvest.

A Python library for modeling queries, filters, expressions, grouping, and aggregations as object structures.

Features

Backend-agnostic modeling: Build expressions, filters, sorting, and aggregations independent of any specific backend
Declarative DSL: Natural Python syntax for building complex queries
Type safety: Optional field type declarations with automatic casting
Immutable structures: All nodes are immutable and thread-safe
Automatic normalization: Compound expressions are automatically flattened
Powerful optimizer: Detects contradictions, tautologies, and simplification opportunities in expressions and sorting
Grammar-based serialization: Convert expressions to/from compact strings for URLs and APIs
Visitor pattern: Extensible architecture for converting to any backend format
Optimization tracking: Optional tracking of all optimization transformations
Sorting specifications: Model sort criteria as objects with optimization and visitor support
Grouping and aggregation: Model grouping and aggregation criteria as objects with optimization and visitor support
Expression templates: Parameterized, persistable filter expressions with named placeholders and a transform pipeline DSL
Field pruning and projection: Remove or project field-based constraints from an expression tree with polarity-aware semantics
Structural equality: All expression types support == and hashing for use in sets, dicts, and equality-based testing

Installation

pip install therismos

Or using uv:

uv pip install therismos

Expressions

Therismos provides a comprehensive expression system for modeling filters and conditions as object structures using an Abstract Syntax Tree (AST) approach.

Quick Start

from therismos import F, optimize

# Define fields
age = F("age", int)
name = F("name")
status = F("status")

# Build expressions using natural Python syntax
expr = (age > 18) & (name == "Alice") | (status == "admin")

# Optimize the expression
optimized, records = optimize(expr)

# More complex example: detect contradictions
contradiction = (age < 30) & (age > 40)
result, _ = optimize(contradiction)
# result is FALSE

# Aggregate OR equality chains
multi_status = (status == "active") | (status == "pending") | (status == "completed")
result, _ = optimize(multi_status)
# result is: status IN ("active", "pending", "completed")

Expression Types

Atomic Expressions

Comparisons: ==, !=, <, <=, >, >=
Range: field.between(lower, upper) — half-open range lower <= field < upper
Regex matching: field.matches(pattern, flags=None)
Membership: field.is_in(*values) or field.is_one_of(iterable)
Null checking: field.is_null(), field.is_not_null()
Constants: TRUE, FALSE

Compound Expressions

AND: expr1 & expr2 or AllExpr(expr1, expr2, ...)
OR: expr1 | expr2 or AnyExpr(expr1, expr2, ...)
NOT: ~expr or NotExpr(expr)

Type Casting

Fields can declare expected types for automatic value casting:

age = F("age", int)
price = F("price", float)

# Values are automatically cast
expr = age == "42"  # value is stored as string
casted = expr.casted_value()  # returns integer 42

Custom cast functions are also supported:

def normalize_email(value):
    return str(value).strip().lower()

email = F("email", normalize_email)

Expression Equality

All expression types support structural equality via Python's == operator and are hashable, allowing them to be placed in sets or used as dict keys.

from therismos import F, AllExpr

age = F("age", int)

# Structural equality — same tree, same result
assert (age > 18) == (age > 18)

# Order-insensitive for commutative expressions (AND / OR / In)
e1 = AllExpr(age > 18, age < 65)
e2 = AllExpr(age < 65, age > 18)
assert e1 == e2  # order does not matter for AND

# Expressions are hashable — usable in sets and dicts
expr_set = {age > 18, age < 65, age > 18}
assert len(expr_set) == 2

Note: Field.__eq__ returns an Eq expression when compared to a plain value (DSL usage), but returns a bool when compared to another Field or expression object.

f1 = F("age")
f2 = F("age")
assert f1 == f2      # True — same field name and type (bool comparison)
expr = f1 == 18      # Eq(F("age"), 18) — DSL usage

Optimization

The optimizer applies various rules to simplify expressions and detect logical issues.

Basic Examples

from therismos import optimize, F, TRUE, FALSE, AllExpr, AnyExpr

age = F("age")

# Identity elimination
expr = AllExpr(age > 18, TRUE, age < 65)
result, _ = optimize(expr)
# result is: AllExpr(age > 18, age < 65)

# Contradiction detection
expr = (age == 25) & (age != 25)
result, _ = optimize(expr)
# result is: FALSE

# Tautology detection
expr = (age < 30) | (age >= 30)
result, _ = optimize(expr)
# result is: TRUE

# NOT simplification (De Morgan's laws)
expr = ~((age > 18) & (name == "Alice"))
result, _ = optimize(expr)
# result is: (age <= 18) OR (name != "Alice")

Optimization Rules Reference

The optimizer implements the following transformation rules:

Atomic Expression Simplifications

Rule	Before	After
Empty IN to FALSE	`f IN ()`	`FALSE`
Single-value IN to Eq	`f IN (v)`	`f == v`
Empty Between range	`f.between(a, b)` where `a >= b`	`FALSE`

NOT Expression Simplifications

Rule	Before	After
NOT of TRUE	`NOT(TRUE)`	`FALSE`
NOT of FALSE	`NOT(FALSE)`	`TRUE`
Double negation	`NOT(NOT(x))`	`x`
NOT of equality	`NOT(f == v)`	`f != v`
NOT of inequality	`NOT(f != v)`	`f == v`
NOT of less-than	`NOT(f < v)`	`f >= v`
NOT of less-or-equal	`NOT(f <= v)`	`f > v`
NOT of greater-than	`NOT(f > v)`	`f <= v`
NOT of greater-or-equal	`NOT(f >= v)`	`f < v`
NOT of null check	`NOT(f.is_null())`	`f.is_not_null()`
NOT of not-null check	`NOT(f.is_not_null())`	`f.is_null()`
De Morgan's law (AND)	`NOT(a AND b)`	`NOT(a) OR NOT(b)`
De Morgan's law (OR)	`NOT(a OR b)`	`NOT(a) AND NOT(b)`

AND Expression Simplifications

Rule	Before	After
Empty AND	`AND()`	`TRUE`
Single operand	`AND(x)`	`x`
FALSE propagation	`AND(..., FALSE, ...)`	`FALSE`
TRUE elimination	`AND(..., TRUE, ...)`	`AND(...)` (TRUE removed)
All TRUE	`AND(TRUE, TRUE, ...)`	`TRUE`
Eq/Eq same value	`(f == v) AND (f == v)`	`f == v`
Eq/Eq different values	`(f == v1) AND (f == v2)`	`FALSE`
Eq/In intersection (member)	`(f == v) AND (f IN (v, ...))`	`f == v`
Eq/In intersection (non-member)	`(f == v) AND (f IN (...))`	`FALSE` (v not in set)
In/In intersection (empty)	`(f IN (v1, v2)) AND (f IN (v3, v4))`	`FALSE` (no overlap)
In/In intersection (single)	`(f IN (v1, v2)) AND (f IN (v2, v3))`	`f == v2`
In/In intersection (multiple)	`(f IN (v1, v2, v3)) AND (f IN (v2, v3, v4))`	`f IN (v2, v3)`

OR Expression Simplifications

Rule	Before	After
Empty OR	`OR()`	`FALSE`
Single operand	`OR(x)`	`x`
TRUE propagation	`OR(..., TRUE, ...)`	`TRUE`
FALSE elimination	`OR(..., FALSE, ...)`	`OR(...)` (FALSE removed)
All FALSE	`OR(FALSE, FALSE, ...)`	`FALSE`
Eq/Eq union	`(f == v1) OR (f == v2)`	`f IN (v1, v2)`
Eq/In union	`(f == v) OR (f IN (v2, v3))`	`f IN (v, v2, v3)`
In/In union	`(f IN (v1, v2)) OR (f IN (v3, v4))`	`f IN (v1, v2, v3, v4)`

Contradiction Detection (AND)

Pattern	Result
`(f == v) AND (f != v)`	`FALSE`
`f.is_null() AND f.is_not_null()`	`FALSE`
`(f < a) AND (f > b)` where `b >= a`	`FALSE`
`(f <= a) AND (f > a)`	`FALSE`
`(f >= b) AND (f < b)`	`FALSE`
`f.between(a, b) AND f.between(c, d)` where `max(a,c) >= min(b,d)`	`FALSE`
`f.between(a, b) AND f.between(c, d)` where ranges overlap	`f.between(max(a,c), min(b,d))`
`f.between(a, b) AND (f > c)` where `c >= b`	`FALSE`
`f.between(a, b) AND (f < c)` where `c <= a`	`FALSE`

Between Range Union (OR)

Pattern	Result
`f.between(a, b) OR f.between(c, d)` — overlapping (`min(b,d) > max(a,c)`)	`f.between(min(a,c), max(b,d))`
`f.between(a, b) OR f.between(b, d)` — adjacent	`f.between(a, d)`

Tautology Detection (OR)

Pattern	Result
`(f == v) OR (f != v)`	`TRUE`
`f.is_null() OR f.is_not_null()`	`TRUE`
`(f < v) OR (f >= v)`	`TRUE`
`(f <= v) OR (f > v)`	`TRUE`

Complex Real-World Example: Detecting Accidental Contradictions

The optimizer is particularly valuable for catching accidentally contradictory conditions in complex business logic. Here's a realistic scenario where multiple nested requirements create an impossible condition:

from therismos import F, optimize, FALSE

# Define fields
user_age = F("age", int)
user_role = F("role")
user_status = F("status")
account_tier = F("account_tier")
dept = F("department")
experience = F("experience_years", int)
available = F("available", bool)

# Complex filter built incrementally by different team members
# Each level seemed reasonable in isolation, but together they create a contradiction
complex_filter = (
    (
        # Level 1: Nested OR conditions for base eligibility
        (
            (
                # Premium account holders
                (account_tier == "premium") &
                (
                    (user_role == "developer") |
                    (user_role == "designer")
                )
            ) |
            (
                # OR enterprise users with experience
                (account_tier == "enterprise") &
                (experience >= 5) &
                (dept.is_in("engineering", "design"))
            )
        ) &
        # Level 2: Status and department requirements with nesting
        (
            (
                (user_status == "active") &
                (
                    # Nested department-specific conditions
                    (
                        (dept == "engineering") &
                        (experience >= 2)
                    ) |
                    (
                        (dept == "design") &
                        (user_role.is_in("designer", "lead_designer"))
                    )
                )
            ) |
            # OR admin override
            (user_role == "admin")
        ) &
        # Level 3: First age requirement
        (user_age >= 25) &
        # Level 4: Second age requirement nested with other conditions
        (
            (user_age <= 50) &
            (
                # More nesting for additional validation
                (account_tier.is_in("premium", "enterprise", "trial")) |
                (user_role == "admin")
            )
        )
    ) & (
        # Level 5: Someone later added "additional validation"
        # without realizing it contradicts the previous age requirements!
        (user_age < 25) &  # Must be under 25
        (user_age > 50)    # AND must be over 50 (impossible!)
    ) &
    (available == True)
)

# The contradiction occurs because:
# - Earlier levels require: 25 <= age <= 50
# - Final level requires: age < 25 AND age > 50
# - These conditions cannot both be true!

result, records = optimize(complex_filter)

print(f"Optimized result: {result}")
# Output: FalseExpr()

print(f"Is FALSE: {result is FALSE}")
# Output: True

print(f"Optimization steps that revealed the contradiction:")
for i, record in enumerate(records, 1):
    print(f"Step {i}: {record.reason}")
    if "Contradiction" in record.reason:
        print(f"  *** This step detected the contradiction! ***")

# Example output:
# Step 1: OR equality chain aggregation to IN
# Step 2: Optimize children in AND
# Step 3: Optimize children in OR
# Step 4: Optimize children in AND
# Step 5: Contradiction detected in AND
#   *** This step detected the contradiction! ***

# By examining the 'before' expression in the contradiction record,
# you can identify exactly which requirements conflict with each other
# and trace back through your business logic to find the source.

The optimizer's tracking feature is invaluable for debugging complex business rules, especially when:

Multiple developers contribute conditions to the same filter over time
Requirements evolve and accidentally introduce conflicts
Combining filters from different parts of the application
Migrating or refactoring legacy filtering logic
Building user-facing query builders where users can create invalid combinations

Optimization Tracking

Track optimization changes:

result, records = optimize(expr)
for record in records:
    print(f"Applied: {record.reason}")
    print(f"Before: {record.before}")
    print(f"After: {record.after}")

You can also use a collecting parameter to accumulate records across multiple optimizations:

my_records = []
result1, _ = optimize(expr1, my_records)
result2, _ = optimize(expr2, my_records)
# my_records now contains all optimization steps from both calls

Field Pruning

prune_fields removes or projects field-based constraints from an expression tree. It is useful when a stored filter contains constraints on fields that are unavailable or irrelevant in a particular execution context, and you need to decide how to handle the missing constraints conservatively or permissively.

from therismos import F, prune_fields, FieldSelection, PruneMode

age = F("age", int)
status = F("status")
dept = F("department")

expr = (age > 18) & (status == "active") & (dept == "engineering")

# PRUNE mode (default): remove listed fields — age constraint dropped
# RESTRICT mode (default): dropping a constraint excludes non-matching records
result = prune_fields(expr, frozenset({"age"}))
# result: AllExpr(status == "active", dept == "engineering")

# RELAX mode: dropping a constraint lets records pass through
result = prune_fields(expr, frozenset({"age"}), mode=PruneMode.RELAX)
# result: AllExpr(status == "active", dept == "engineering")

# Where RESTRICT vs RELAX differ — single constraint
only_age = age > 18
result_restrict = prune_fields(only_age, frozenset({"age"}))
# result: FALSE  (no constraint left → exclude)

result_relax = prune_fields(only_age, frozenset({"age"}), mode=PruneMode.RELAX)
# result: TRUE   (no constraint left → include)

# KEEP mode: keep only listed fields, prune everything else
result = prune_fields(expr, frozenset({"status"}), selection=FieldSelection.KEEP)
# result: status == "active"

Polarity-aware substitution under NOT

The substitution correctly flips semantics when a pruned leaf appears inside a NOT expression:

from therismos import F, prune_fields, PruneMode

age = F("age", int)
status = F("status")

expr = ~(age > 18) & (status == "active")

# RESTRICT: age is pruned to FALSE at positive polarity
# NOT(FALSE) → TRUE → TRUE & (status == "active") → status == "active"
result = prune_fields(expr, frozenset({"age"}))
# result: status == "active"

Expression Evaluation

Expressions can be evaluated against actual data to determine if the data satisfies the filter criteria. This is useful for:

In-memory filtering when a database query is not needed
Testing and validating filter logic
Client-side filtering before sending data to a backend
Data validation and access control checks

The eval() method is designed to handle both single-valued and multi-valued (list-like) fields. To support this, all input data must be wrapped with the unwind_data utility function. This function flattens nested data structures into a consistent format that the evaluation engine can process.

Basic Evaluation

The eval() method evaluates an expression against a dictionary of field values, which must be passed to unwind_data.

from therismos import F, unwind_data

age = F("age")
status = F("status")

# Build an expression
expr = (age > 18) & (status == "active")

# Evaluate against data by wrapping it with unwind_data
data = {"age": 25, "status": "active"}
result = expr.evaluate(unwind_data(data))  # Returns True

data = {"age": 15, "status": "active"}
result = expr.evaluate(unwind_data(data))  # Returns False

Evaluation with Type Casting

When fields have declared types, values are automatically cast during evaluation:

age = F("age", int)
expr = age >= 18

# String values are automatically cast to int
result = expr.evaluate(unwind_data({"age": "25"}))  # Returns True

# This will raise TypeError or ValueError if casting fails
try:
    expr.evaluate(unwind_data({"age": "not_a_number"}))
except (TypeError, ValueError):
    print("Invalid age value")

Multi-Valued Field Evaluation

The evaluation engine seamlessly handles fields that contain lists or nested lists of values.

Comparison Operators (==, >, <, etc.) use "any" semantics: the condition is True if any value in the list meets the criteria.
Inequality (!=) uses "none" semantics: the condition is True if no value in the list meets the criteria.

# 'tags' is a multi-valued field
tags = F("tags")
scores = F("scores", int)

# Equality: True if "python" is ANY of the tags
expr_eq = tags == "python"
data_eq = {"tags": ["java", "python", "rust"]}
assert expr_eq.evaluate(unwind_data(data_eq)) is True  # True because "python" is present

# Inequality: True if "python" is NONE of the tags
expr_ne = tags != "python"
data_ne_pass = {"tags": ["java", "rust"]}
data_ne_fail = {"tags": ["java", "python"]}
assert expr_ne.evaluate(unwind_data(data_ne_pass)) is True  # True because "python" is absent
assert expr_ne.evaluate(unwind_data(data_ne_fail)) is False # False because "python" is present

# Greater Than: True if ANY score is > 80
expr_gt = scores > 80
data_gt = {"scores": [60, 75, 90]}
assert expr_gt.evaluate(unwind_data(data_gt)) is True # True because 90 > 80

# The data can even be nested
data_nested = {"scores": [[60, 75], [90, 40]]}
assert expr_gt.evaluate(unwind_data(data_nested)) is True # Still True, as 90 > 80

Evaluating Membership and Regex

is_in and matches also work with multi-valued fields, returning True if any value in the field's list satisfies the condition.

import re

# IN expressions
status = F("status")
expr = status.is_in("active", "pending", "approved")
result = expr.evaluate(unwind_data({"status": "active"}))  # Returns True

# Regex matching on a multi-valued field
log_messages = F("logs")
expr = log_messages.matches(r"ERROR:", re.IGNORECASE)
data = {"logs": ["INFO: User logged in", "ERROR: Connection failed"]}
result = expr.evaluate(unwind_data(data))  # Returns True because one message matches

# Null checking
phone = F("phone")
expr = phone.is_null()
result = expr.evaluate(unwind_data({"phone": None}))  # Returns True

Complex Evaluation Examples

Compound expressions evaluate all nested conditions, now with support for multi-valued fields.

age = F("age", int)
country = F("country")
verified = F("verified")
subscription = F("subscription")

# Complex eligibility check
expr = (
    (age >= 18) &
    (country.is_one_of(["US", "UK", "CA"])) &
    ((verified == True) | (subscription.is_in("premium", "enterprise")))
)

# Adult in allowed country with verification
result = expr.evaluate(unwind_data({
    "age": 25,
    "country": "US",
    "verified": True,
    "subscription": "free"
}))  # Returns True

# Adult in allowed country with premium subscription (unverified)
result = expr.evaluate(unwind_data({
    "age": 30,
    "country": "UK",
    "verified": False,
    "subscription": "premium"
}))  # Returns True

# Minor (fails age requirement)
result = expr.evaluate(unwind_data({
    "age": 16,
    "country": "US",
    "verified": True,
    "subscription": "premium"
}))  # Returns False

Evaluation with Optimized Expressions

You can optimize expressions before evaluation for better performance or to catch logical issues:

age = F("age", int)
status = F("status")

# Build a complex expression
expr = (
    ((age > 18) | (age > 25)) &  # Redundant condition
    (status == "active") &
    ((age < 30) | (age >= 30))   # Tautology
)

# Optimize first
optimized, _ = optimize(expr)
# optimized is simplified to: (age > 18) AND (status == "active")

# Then evaluate the optimized expression
result = optimized.evaluate(unwind_data({"age": 25, "status": "active"}))  # Returns True

Error Handling

Evaluation raises exceptions for invalid data:

age = F("age")
expr = age > 18

# Missing field raises KeyError
try:
    expr.evaluate(unwind_data({"name": "Alice"}))  # age field is missing
except KeyError:
    print("Required field 'age' not found")

# Invalid type casting raises TypeError or ValueError
age_typed = F("age", int)
expr = age_typed > 18
try:
    expr.evaluate(unwind_data({"age": "not_a_number"}))
except (TypeError, ValueError):
    print("Cannot cast value to required type")

Converting to other formats

Therismos uses the visitor pattern to enable extensible conversions of expressions to any format. You can implement custom visitors or use the built-in ones.

Custom Visitors

Implement custom visitors to convert expressions to any format:

from therismos import ExprVisitor

class SQLVisitor:
    def visit_eq(self, expr):
        return f"{expr.field.name} = ?"

    def visit_all(self, expr):
        parts = [e.accept(self) for e in expr.exprs]
        return " AND ".join(parts)

    # ... implement other visit methods

visitor = SQLVisitor()
sql = expr.accept(visitor)

Built-in Visitors

Therismos provides several built-in visitors for common use cases:

StringVisitor

Converts expressions to human-readable string representation:

from therismos import F, StringVisitor

age = F("age")
name = F("name")
expr = (age > 18) & (name == "Alice")

visitor = StringVisitor()
result = expr.accept(visitor)
# Output: "(age > 18 AND name = 'Alice')"

CountVisitor

Counts the number of nodes in an expression tree:

from therismos import F, CountVisitor

age = F("age")
name = F("name")
expr = (age > 18) & (name == "Alice")

visitor = CountVisitor()
count = expr.accept(visitor)
# Output: 3 (1 AllExpr + 2 atomic expressions)

DictVisitor

Converts expressions to dictionary representation for serialization:

from therismos import F, DictVisitor

age = F("age")
expr = age > 18

visitor = DictVisitor()
result = expr.accept(visitor)
# Output: {"type": "gt", "field": "age", "value": 18}

For compound expressions, the dictionary is nested:

age = F("age")
name = F("name")
expr = (age > 18) & (name == "Alice")

visitor = DictVisitor()
result = expr.accept(visitor)
# Output: {
#     "type": "and",
#     "exprs": [
#         {"type": "gt", "field": "age", "value": 18},
#         {"type": "eq", "field": "name", "value": "Alice"}
#     ]
# }

FieldGathererVisitor

Collects all unique field names used in an expression tree:

from therismos import F, FieldGathererVisitor

age = F("age")
name = F("name")
status = F("status")
expr = (age > 18) & (name == "Alice") | (status == "active")

visitor = FieldGathererVisitor()
expr.accept(visitor)
field_names = visitor.field_names
# Output: {"age", "name", "status"}

This is useful for:

Analyzing which fields are used in complex filters
Validating that all referenced fields exist in your schema
Generating documentation or metadata about queries
Determining required permissions for a query

Backend Converters

MongoVisitor

The MongoVisitor converts therismos expressions to MongoDB query filters compatible with PyMongo and Motor.

Installation:

# For synchronous PyMongo
uv pip install therismos[mongodb]

# For asynchronous Motor
uv pip install therismos[mongodb-async]

Basic Usage:

from therismos import F, optimize
from therismos.expr.visitors.mongo import MongoVisitor

age = F("age")
status = F("status")
country = F("country")

# Build and optimize expression
expr = (age >= 21) & (status == "active") & (country.is_in("US", "UK", "CA"))
optimized, _ = optimize(expr)

# Convert to MongoDB filter
visitor = MongoVisitor()
mongo_filter = optimized.accept(visitor)

# Result: {
#     "age": {"$gte": 21},
#     "status": "active",
#     "country": {"$in": ["US", "UK", "CA"]}
# }

Using with PyMongo:

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["mydb"]
collection = db["users"]

# Use the generated filter
results = collection.find(mongo_filter)
for doc in results:
    print(doc)

Using with Motor (async):

import asyncio
from motor.motor_asyncio import AsyncIOMotorClient

async def find_users():
    client = AsyncIOMotorClient("mongodb://localhost:27017/")
    db = client["mydb"]
    collection = db["users"]

    # Use the generated filter
    cursor = collection.find(mongo_filter)
    results = await cursor.to_list(length=100)
    return results

asyncio.run(find_users())

Advanced Features:

The MongoVisitor handles all therismos expression types:

import re
from therismos import F, TRUE, FALSE
from therismos.expr.visitors.mongo import MongoVisitor

email = F("email")
age = F("age")
name = F("name")
status = F("status")

visitor = MongoVisitor()

# Regex matching (with case-insensitive flag)
expr = email.matches(r".*@example\.com$", re.IGNORECASE)
mongo_filter = expr.accept(visitor)
# Result: {"email": {"$regex": ".*@example\\.com$", "$options": "i"}}

# Range queries
expr = (age >= 18) & (age <= 65)
mongo_filter = expr.accept(visitor)
# Result: {"age": {"$gte": 18, "$lte": 65}} (optimized)

# Complex OR conditions
expr = (status == "active") | (status == "pending") | (status == "approved")
optimized_expr, _ = optimize(expr)  # Converts to IN
mongo_filter = optimized_expr.accept(visitor)
# Result: {"status": {"$in": ["active", "pending", "approved"]}}

# Null checking
expr = name.is_not_null()
mongo_filter = expr.accept(visitor)
# Result: {"name": {"$ne": null}}

# NOT expressions
expr = ~(age < 18)
mongo_filter = expr.accept(visitor)
# Result: {"$nor": [{"age": {"$lt": 18}}]}

# Constants
true_filter = TRUE.accept(visitor)   # Result: {}
false_filter = FALSE.accept(visitor)  # Result: {"$expr": false}

Optimization Options:

# By default, simple AND expressions are optimized by merging fields
visitor = MongoVisitor(optimize_simple_and=True)
expr = (age > 18) & (name == "Alice")
mongo_filter = expr.accept(visitor)
# Result: {"age": {"$gt": 18}, "name": "Alice"}

# Disable optimization to always use $and
visitor = MongoVisitor(optimize_simple_and=False)
mongo_filter = expr.accept(visitor)
# Result: {"$and": [{"age": {"$gt": 18}}, {"name": "Alice"}]}

Type Casting:

The MongoVisitor respects field type declarations and automatically casts values:

age = F("age", int)
expr = age.is_in(18, 21, 25)  # Values will be cast to int

visitor = MongoVisitor()
mongo_filter = expr.accept(visitor)
# Result: {"age": {"$in": [18, 21, 25]}}

Expression Serialization

Therismos provides grammar-based serialization to convert expressions to/from compact string representations. This is particularly useful for URL query strings, API parameters, and storing filters as text.

Core Concepts

Serialization Basics

The Serializer class converts expressions to compact strings:

from therismos import F, Serializer, Eq, AllExpr, Gt

serializer = Serializer()

# Serialize simple expressions
expr = Eq(F("age"), 18)
text = serializer.serialize(expr)
# Result: "age==18"

# Compound expressions
expr = AllExpr(Eq(F("age"), 18), Gt(F("score"), 75))
text = serializer.serialize(expr)
# Result: "(age==18;score>75)"

# Deserialize strings back to expressions
expr = serializer.deserialize("age==18")
# Result: Eq(field=Field(name='age', type_=None), value=18)

Grammar Reference

The serializer uses a compact grammar optimized for URL usage:

Python Operator	Grammar Syntax	Example
`&` (AND)	`;`	`age>18;status=="active"`
`\|` (OR)	`,`	`status=="active",status=="pending"`
`~` (NOT)	`!`	`!(age<18)`
`==`	`==`	`age==18`
`!=`	`!=`	`status!="inactive"`
`<`	`<`	`age<65`
`<=`	`<=`	`age<=65`
`>`	`>`	`age>18`
`>=`	`>=`	`age>=18`
`.is_in()`	`=in=`	`status=in=("active","pending")`
`.matches()`	`~regex`	`email~regex(".*@example\\.com")`
`.is_null()`	`==null`	`deleted_at==null`
`.is_not_null()`	`!=null`	`created_at!=null`
`TRUE`	`true()`	`true()`
`FALSE`	`false()`	`false()`

Precedence: ! (NOT) > ; (AND) > , (OR)

Serialization Features

Basic Usage

For use in URL query strings, enable URL encoding:

# Create a serializer with URL encoding
serializer = Serializer(url_encode=True)

# Serialize with URL encoding
expr = Eq(F("name"), "Alice Smith")
text = serializer.serialize(expr)
# Result: URL-encoded string

# Deserialize automatically decodes
expr = serializer.deserialize(text)
# Result: Original expression

Type Handling

Control type annotation output in serialization:

age = F("age", int)
name = F("name", str)

# Without type annotations (default)
serializer = Serializer()
text = serializer.serialize(Eq(age, 18))
# Result: "age==18"

# With all type annotations
serializer = Serializer(include_all_types=True)
text = serializer.serialize(Eq(age, 18))
# Result: "age{int}==18"

def uppercase_transform(x):
    return str(x).upper()

serializer = Serializer()
serializer.register_custom_type(uppercase_transform, "upper")

# Use the custom type
field = F("code", uppercase_transform)
expr = Eq(field, "abc")

# Serialize with type annotation
serializer_typed = Serializer(include_all_types=True)
serializer_typed.register_custom_type(uppercase_transform, "upper")
text = serializer_typed.serialize(expr)
# Result: "code{upper}==\"ABC\"" (value is transformed)

Values are automatically cast during deserialization when type annotations are present:

import uuid
from therismos import Serializer

serializer = Serializer()
serializer.register_custom_type(uuid.UUID, 'uuid.UUID')

# Deserialize with type annotation
expr = serializer.deserialize('user_id{uuid.UUID}=="550e8400-e29b-41d4-a716-446655440000"')

# Value is automatically cast to UUID
assert isinstance(expr.value, uuid.UUID)
assert expr.value == uuid.UUID("550e8400-e29b-41d4-a716-446655440000")

Use the implicit_field_types parameter to define type mappings for field names, avoiding repeating type annotations:

import uuid
from decimal import Decimal
from therismos import Serializer

# Define implicit field type mappings
implicit_field_types = {
    "user_id": uuid.UUID,
    "product_id": uuid.UUID,
    "price": Decimal,
}

serializer = Serializer(implicit_field_types=implicit_field_types)
serializer.register_custom_type(uuid.UUID, 'uuid.UUID')
serializer.register_custom_type(Decimal, 'Decimal')

# No type annotation needed - uses implicit mapping
expr = serializer.deserialize('user_id=="550e8400-e29b-41d4-a716-446655440000"')
assert expr.field.type_ is uuid.UUID
assert isinstance(expr.value, uuid.UUID)

# You can also register field types dynamically
serializer.register_field_type("account_id", uuid.UUID)

# Explicit type annotations always override implicit mappings
expr = serializer.deserialize('price{int}=="100"')
assert expr.field.type_ is int  # Not Decimal

Advanced Features

Field names with dots for nested references:

expr = Eq(F("user.profile.age"), 25)
text = serializer.serialize(expr)
# Result: "user.profile.age==25"

Complete roundtrip example:

from therismos import F, Serializer, optimize

# Build expression
expr = (F("age") >= 21) & (F("status").is_in("active", "pending"))

# Optimize and serialize for URL
optimized, _ = optimize(expr)
serializer = Serializer(url_encode=True)
query_param = serializer.serialize(optimized)
# Use in URL: /api/users?filter={query_param}

# Later, deserialize from the URL parameter
received_expr = serializer.deserialize(query_param)

Value Reference

The serializer supports various value types:

serializer = Serializer()

# Strings (double-quoted with escapes)
serializer.serialize(Eq(F("name"), "Alice"))
# Result: "name==\"Alice\""

# Numbers (integers and floats)
serializer.serialize(Eq(F("age"), 25))
# Result: "age==25"

# Booleans
serializer.serialize(Eq(F("active"), True))
# Result: "active==true"

# Null
serializer.serialize(Eq(F("value"), None))
# Result: "value==null"

# Identifiers (unquoted - interpreted as strings)
expr = serializer.deserialize("status==active")
# value is the string "active"

Sorting

Therismos provides a sorting system for modeling sort criteria as object structures, similar to how expressions model filters.

Quick Start

from therismos.sorting import SortSpec, SortCriterion, SortOrder

# Create sort criteria using plain strings
spec = SortSpec([
    SortCriterion("age", SortOrder.DESCENDING),
    SortCriterion("name", SortOrder.ASCENDING),
])

# Convert to string
from therismos.sorting.visitors import StringVisitor
visitor = StringVisitor()
print(spec.accept(visitor))
# Output: "age DESC, name ASC"

Sort Orders

Three sort orders are available:

SortOrder.ASCENDING (value: 1): Sort in ascending order
SortOrder.DESCENDING (value: -1): Sort in descending order
SortOrder.NONE (value: 0): No sorting (typically filtered out during optimization)

Creating Sort Specifications

from therismos.sorting import SortSpec, SortCriterion, SortOrder

# Individual criterion
criterion = SortCriterion("age", SortOrder.ASCENDING)

# Full specification
spec = SortSpec([
    SortCriterion("created_at", SortOrder.DESCENDING),
    SortCriterion("priority", SortOrder.ASCENDING),
    SortCriterion("name", SortOrder.ASCENDING),
])

# SortSpec is a list-like collection
spec.append(SortCriterion("id", SortOrder.ASCENDING))
print(len(spec))  # 4

Optimization

The sorting optimizer removes redundant and meaningless criteria:

from therismos.sorting import SortSpec, SortCriterion, SortOrder
from therismos.sorting.optimizer import optimize

spec = SortSpec([
    SortCriterion("age", SortOrder.ASCENDING),
    SortCriterion("name", SortOrder.NONE),      # Will be removed
    SortCriterion("age", SortOrder.DESCENDING), # Overrides first "age"
])

optimized, records = optimize(spec)
# Result: SortSpec([SortCriterion("age", SortOrder.DESCENDING)])
# Only one criterion remains - the last occurrence of "age"

# Check what was optimized
for record in records:
    print(record.reason)

Optimization rules:

Remove NONE orders: Criteria with SortOrder.NONE are removed
Remove redundant criteria: When a field appears multiple times, only the last occurrence is kept

Converting to Other Formats

Built-in Visitors

from therismos.sorting import SortSpec, SortCriterion, SortOrder
from therismos.sorting.visitors import StringVisitor, DictVisitor, FieldGathererVisitor

spec = SortSpec([
    SortCriterion("age", SortOrder.DESCENDING),
    SortCriterion("name", SortOrder.ASCENDING),
])

# String representation
string_visitor = StringVisitor()
print(spec.accept(string_visitor))
# Output: "age DESC, name ASC"

# Dictionary representation
dict_visitor = DictVisitor()
result = spec.accept(dict_visitor)
# Result: [{"field": "age", "order": "DESC"}, {"field": "name", "order": "ASC"}]

# Collect field names
field_visitor = FieldGathererVisitor()
spec.accept(field_visitor)
print(field_visitor.field_names)
# Output: {"age", "name"}

MongoDB Sorting

from therismos.sorting import SortSpec, SortCriterion, SortOrder
from therismos.sorting.visitors.mongo import MongoVisitor

spec = SortSpec([
    SortCriterion("created_at", SortOrder.DESCENDING),
    SortCriterion("name", SortOrder.ASCENDING),
])

visitor = MongoVisitor()
mongo_sort = spec.accept(visitor)
# Result: {"created_at": -1, "name": 1}

# Use with PyMongo
# cursor = collection.find().sort(list(mongo_sort.items()))

# Use with Motor (async)
# cursor = await collection.find().sort(list(mongo_sort.items())).to_list(length=100)

Serialization

Convert sort specifications to/from compact string format for URLs and APIs:

from therismos.sorting import Serializer

# Create serializer
serializer = Serializer()

# Serialize to string
spec = SortSpec([
    SortCriterion("age", SortOrder.ASCENDING),
    SortCriterion("created_at", SortOrder.DESCENDING),
    SortCriterion("priority", SortOrder.ASCENDING),
])

text = serializer.serialize(spec)
# Result: "age,-created_at,priority"

# Deserialize from string
restored = serializer.deserialize("name,-score,+priority")
# Result: SortSpec with name ASC, score DESC, priority ASC

# Format rules:
# - Comma-separated list
# - No prefix or + prefix = ascending
# - Minus prefix (-) = descending

Custom Visitors

Create custom visitors to convert sort specifications to any format:

from therismos.sorting import SortCriterion, SortSpec

class SQLVisitor:
    """Convert sort spec to SQL ORDER BY clause."""

    def visit_sort_criterion(self, criterion: SortCriterion) -> str:
        order_str = "ASC" if criterion.order == SortOrder.ASCENDING else "DESC"
        return f"{criterion.field} {order_str}"

    def visit_sort_spec(self, spec: SortSpec) -> str:
        if not spec:
            return ""
        parts = [criterion.accept(self) for criterion in spec]
        return "ORDER BY " + ", ".join(parts)

# Usage
visitor = SQLVisitor()
result = spec.accept(visitor)
# Result: "ORDER BY created_at DESC, name ASC"

Grouping and Aggregation

Therismos provides a grouping and aggregation system for modeling SQL-like GROUP BY operations with aggregation functions as object structures.

Quick Start

from therismos.grouping import GroupSpec, Aggregation, AggregationFunction

# Create a grouping specification
spec = GroupSpec(
    group_by=["category", "region"],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "price"),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)

# Convert to string
from therismos.grouping.visitors import StringVisitor
visitor = StringVisitor()
print(spec.accept(visitor))
# Output: ("category,region", "total:count,min_price:min:price,avg_price:average:price")

Aggregation Functions

Therismos supports a comprehensive set of aggregation functions:

COUNT: Count of items in each group (field is optional and silently ignored if provided; recommended usage omits it)
SUM: Sum of values
MIN: Minimum value
MAX: Maximum value
AVERAGE: Average (mean) value
STDDEV: Standard deviation
MEDIAN: Median value
Q1: First quartile (25th percentile)
Q3: Third quartile (75th percentile)
P01, P05, P10: 1st, 5th, and 10th percentiles
P90, P95, P99: 90th, 95th, and 99th percentiles

All aggregation functions except COUNT require a field to aggregate. For COUNT, any provided field is silently ignored; the recommended form omits it entirely.

Creating Grouping Specifications

from therismos.grouping import GroupSpec, Aggregation, AggregationFunction

# Simple grouping with count
spec = GroupSpec(
    group_by=["status"],
    aggregations=[Aggregation("count", AggregationFunction.COUNT)],
)

# Multiple grouping fields
spec = GroupSpec(
    group_by=["category", "region", "status"],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "price"),
        Aggregation("max_price", AggregationFunction.MAX, "price"),
        Aggregation("avg_revenue", AggregationFunction.AVERAGE, "revenue"),
    ],
)

# Percentile aggregations
spec = GroupSpec(
    group_by=["service"],
    aggregations=[
        Aggregation("p95_latency", AggregationFunction.P95, "latency"),
        Aggregation("p99_latency", AggregationFunction.P99, "latency"),
        Aggregation("median_latency", AggregationFunction.MEDIAN, "latency"),
    ],
)

# Global aggregation (no grouping)
spec = GroupSpec(
    group_by=[],
    aggregations=[
        Aggregation("total_count", AggregationFunction.COUNT),
        Aggregation("overall_avg", AggregationFunction.AVERAGE, "score"),
    ],
)

Optimization

The grouping optimizer removes redundant grouping fields and duplicate aggregation definitions:

from therismos.grouping.optimizer import optimize

spec = GroupSpec(
    group_by=["category", "region", "category"],  # duplicate field
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "price"),
        Aggregation("total", AggregationFunction.MAX, "quantity"),  # duplicate ID
    ],
)

optimized, records = optimize(spec)
# Result: group_by=["region", "category"], aggregations with last "total" kept

# Check what was optimized
for record in records:
    print(record.reason)

Optimization rules:

Remove duplicate grouping fields: When a field appears multiple times in group_by, only the last occurrence is kept
Remove duplicate aggregation IDs: When an aggregation ID appears multiple times, only the last definition is kept

Converting to Other Formats

Built-in Visitors

from therismos.grouping.visitors import StringVisitor, DictVisitor, FieldGathererVisitor

spec = GroupSpec(
    group_by=["category", "region"],
    aggregations=[
        Aggregation("count", AggregationFunction.COUNT),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)

# String representation
string_visitor = StringVisitor()
print(spec.accept(string_visitor))
# Output: ("category,region", "count:count,avg_price:average:price")

# Dictionary representation
dict_visitor = DictVisitor()
result = spec.accept(dict_visitor)
# Result: {
#     "group_by": ["category", "region"],
#     "aggregations": [
#         {"id": "count", "function": "count", "field": None},
#         {"id": "avg_price", "function": "average", "field": "price"}
#     ]
# }

# Collect field names
field_visitor = FieldGathererVisitor()
spec.accept(field_visitor)
print(field_visitor.field_names)
# Output: {"category", "region", "price"}

MongoDB Aggregation Pipelines

from therismos.grouping.visitors.mongo import MongoVisitor

spec = GroupSpec(
    group_by=["category", "region"],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "price"),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
        Aggregation("p95_latency", AggregationFunction.P95, "latency"),
    ],
)

visitor = MongoVisitor()
group_stage = spec.accept(visitor)
# Result: {
#     "$group": {
#         "_id": {"category": "$category", "region": "$region"},
#         "total": {"$sum": 1},
#         "min_price": {"$min": "$price"},
#         "avg_price": {"$avg": "$price"},
#         "p95_latency": {"$percentile": {"input": "$latency", "p": [0.95], "method": "approximate"}}
#     }
# }

# Use with PyMongo
# pipeline = [group_stage]
# results = collection.aggregate(pipeline)

# Use with Motor (async)
# pipeline = [group_stage]
# results = await collection.aggregate(pipeline).to_list(length=None)

Single vs. Multiple Grouping Fields:

By default, the MongoDB visitor simplifies single grouping fields:

# Single grouping field
spec = GroupSpec(
    group_by=["status"],
    aggregations=[Aggregation("count", AggregationFunction.COUNT)],
)

visitor = MongoVisitor()
result = spec.accept(visitor)
# Result: {"$group": {"_id": "$status", "count": {"$sum": 1}}}

# Disable simplification for consistency
visitor = MongoVisitor(simplify_single_group=False)
result = spec.accept(visitor)
# Result: {"$group": {"_id": {"status": "$status"}, "count": {"$sum": 1}}}

Global Aggregation:

# No grouping fields (aggregate all documents)
spec = GroupSpec(
    group_by=[],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("avg_age", AggregationFunction.AVERAGE, "age"),
    ],
)

visitor = MongoVisitor()
result = spec.accept(visitor)
# Result: {"$group": {"_id": None, "total": {"$sum": 1}, "avg_age": {"$avg": "$age"}}}

Serialization

Convert grouping specifications to/from compact string format for URLs and APIs:

from therismos.grouping import Serializer

# Create serializer
serializer = Serializer()

# Serialize to string
spec = GroupSpec(
    group_by=["category", "region"],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "price"),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)

text = serializer.serialize(spec)
# Result: ("category,region", "total:count,min_price:min:price,avg_price:average:price")

# Deserialize from string
restored = serializer.deserialize('("category,region", "total:count,min_price:min:price")')
# Result: GroupSpec with category+region grouping and two aggregations

# Format rules:
# - Tuple format: ("field1,field2", "agg1:func,agg2:func:field")
# - Grouping fields: comma-separated list
# - Aggregations: comma-separated list of "id:function" or "id:function:field"
# - COUNT aggregation: "id:count" (field omitted; any field is silently ignored)
# - Other aggregations: "id:function:field" (field required)

Roundtrip Serialization:

original = GroupSpec(
    group_by=["category", "region"],
    aggregations=[
        Aggregation("total", AggregationFunction.COUNT),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)

serializer = Serializer()
text = serializer.serialize(original)
restored = serializer.deserialize(text)

# Restored spec is equivalent to original
assert restored.group_by == original.group_by
assert len(restored.aggregations) == len(original.aggregations)

Custom Visitors

Create custom visitors to convert grouping specifications to any format:

from therismos.grouping import GroupSpec

class PandasVisitor:
    """Convert grouping spec to pandas groupby + agg syntax."""

    def visit_group_spec(self, spec: GroupSpec) -> str:
        if not spec.group_by:
            # Global aggregation
            agg_dict = self._build_agg_dict(spec.aggregations.values())
            return f"df.agg({agg_dict})"

        # Groupby aggregation
        group_fields = list(spec.group_by)
        agg_dict = self._build_agg_dict(spec.aggregations.values())
        return f"df.groupby({group_fields}).agg({agg_dict})"

    def _build_agg_dict(self, aggregations):
        agg_map = {
            "count": "count",
            "min": "min",
            "max": "max",
            "average": "mean",
            "stddev": "std",
            "median": "median",
        }
        result = {}
        for agg in aggregations:
            if agg.function.value in agg_map:
                func = agg_map[agg.function.value]
                if agg.field:
                    result[agg.id] = (agg.field, func)
        return result

# Usage
visitor = PandasVisitor()
result = spec.accept(visitor)
# Result: "df.groupby(['category', 'region']).agg({...})"

Complete Example: Analytics Dashboard

from therismos.grouping import GroupSpec, Aggregation, AggregationFunction
from therismos.grouping.optimizer import optimize
from therismos.grouping.visitors.mongo import MongoVisitor

# Define grouping specification for sales analytics
sales_analysis = GroupSpec(
    group_by=["product_category", "region", "quarter"],
    aggregations=[
        Aggregation("total_sales", AggregationFunction.COUNT),
        Aggregation("min_price", AggregationFunction.MIN, "sale_price"),
        Aggregation("max_price", AggregationFunction.MAX, "sale_price"),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "sale_price"),
        Aggregation("revenue", AggregationFunction.AVERAGE, "revenue"),
        Aggregation("p50_sale_time", AggregationFunction.MEDIAN, "processing_time"),
        Aggregation("p95_sale_time", AggregationFunction.P95, "processing_time"),
    ],
)

# Optimize the specification
optimized, records = optimize(sales_analysis)

# Convert to MongoDB aggregation pipeline
visitor = MongoVisitor()
group_stage = optimized.accept(visitor)

# Use in MongoDB query
# from pymongo import MongoClient
# client = MongoClient("mongodb://localhost:27017/")
# db = client["sales_db"]
# collection = db["transactions"]
#
# pipeline = [
#     {"$match": {"year": 2024}},  # Filter stage
#     group_stage,                  # Our grouping specification
#     {"$sort": {"total_sales": -1}}  # Sort by sales count
# ]
#
# results = collection.aggregate(pipeline)
# for group in results:
#     print(f"Category: {group['_id']['product_category']}")
#     print(f"Region: {group['_id']['region']}")
#     print(f"Quarter: {group['_id']['quarter']}")
#     print(f"Total Sales: {group['total_sales']}")
#     print(f"Avg Price: {group['avg_price']}")
#     print(f"P95 Processing Time: {group['p95_sale_time']}")
#     print("---")

Polars and Pandas Integration

Therismos provides first-class support for Polars and pandas DataFrames via optional backend visitors.

Installation

# Polars backend
pip install therismos[polars]

# Pandas backend
pip install therismos[pandas]

# Both
pip install therismos[polars,pandas]

Polars Integration

import polars as pl
from therismos import F
from therismos.sorting import SortSpec, SortCriterion, SortOrder
from therismos.grouping import GroupSpec, Aggregation, AggregationFunction
from therismos.expr.visitors.polars import PolarsExprVisitor
from therismos.sorting.visitors.polars import PolarsSortSpecVisitor
from therismos.grouping.visitors.polars import PolarsGroupSpecVisitor

df = pl.DataFrame({
    "age": [20, 15, 30],
    "status": ["active", "inactive", "active"],
    "price": [10.0, 20.0, 15.0],
    "category": ["A", "B", "A"],
})

# Filter with expressions
age = F("age")
status = F("status")
expr = (age > 18) & (status == "active")

pl_expr = expr.accept(PolarsExprVisitor())
df.filter(pl_expr)            # eager DataFrame
df.lazy().filter(pl_expr)     # lazy LazyFrame

# Sort with SortSpec
spec = SortSpec([
    SortCriterion("age", SortOrder.DESCENDING),
    SortCriterion("status", SortOrder.ASCENDING),
])
sort = spec.accept(PolarsSortSpecVisitor())
df.sort(by=list(sort.by), descending=list(sort.descending))

# Group and aggregate with GroupSpec
group_spec = GroupSpec(
    group_by=["category"],
    aggregations=[
        Aggregation("count", AggregationFunction.COUNT),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)
grp = group_spec.accept(PolarsGroupSpecVisitor())
df.group_by(list(grp.group_by)).agg(list(grp.agg))

Pandas Integration

import pandas as pd
from therismos import F
from therismos.sorting import SortSpec, SortCriterion, SortOrder
from therismos.grouping import GroupSpec, Aggregation, AggregationFunction
from therismos.expr.visitors.pandas import PandasExprVisitor
from therismos.sorting.visitors.pandas import PandasSortSpecVisitor
from therismos.grouping.visitors.pandas import PandasGroupSpecVisitor

df = pd.DataFrame({
    "age": [20, 15, 30],
    "status": ["active", "inactive", "active"],
    "price": [10.0, 20.0, 15.0],
    "category": ["A", "B", "A"],
})

# Filter with expressions — returns a callable PandasFilter
age = F("age")
status = F("status")
expr = (age > 18) & (status == "active")

mask = expr.accept(PandasExprVisitor())
df[mask(df)]

# Sort with SortSpec
spec = SortSpec([
    SortCriterion("age", SortOrder.DESCENDING),
])
sort = spec.accept(PandasSortSpecVisitor())
df.sort_values(by=list(sort.by), ascending=list(sort.ascending))

# Group and aggregate with GroupSpec
group_spec = GroupSpec(
    group_by=["category"],
    aggregations=[
        Aggregation("count", AggregationFunction.COUNT),
        Aggregation("avg_price", AggregationFunction.AVERAGE, "price"),
    ],
)
grp = group_spec.accept(PandasGroupSpecVisitor())
df.groupby(list(grp.group_by)).agg(**grp.agg)

Expression Templates

Expression templates let you define parameterized filter expressions that are fully serializable to JSON. Named placeholders ($start, $end) are computed from a runtime context via a transform pipeline DSL, making templates suitable for persistent storage in a database or config file.

Quick Start

import datetime
from therismos import F, ExprTemplate
from therismos.expr._expr import TemplateParam
from therismos.expr.template import RuleSerializer, TemplateParamSpec

# Build a "last 7 days" template
field = F("created", datetime.date)
expr = (field >= TemplateParam("start", datetime.date)) & (field <= TemplateParam("end", datetime.date))

rule_ser = RuleSerializer()
tmpl = ExprTemplate(
    expr=expr,
    params={
        "start": TemplateParamSpec(description="Range start (inclusive)"),
        "end":   TemplateParamSpec(description="Range end (inclusive)"),
    },
    rules={
        "end":   rule_ser.deserialize("$now | extract_date"),
        "start": rule_ser.deserialize("$now | extract_date | sub_time(7d)"),
    },
)

# Bind: supply a context and get a concrete expression
now = datetime.datetime(2026, 3, 18, 16, 30)
bound = tmpl.bind({"now": now})
# → created{date}>=2026-03-11; created{date}<=2026-03-18

Template Parameters

A TemplateParam is a named placeholder that can appear in any value position of an expression node:

from therismos import F
from therismos.expr._expr import TemplateParam

age = F("age", int)
threshold = TemplateParam("min_age", int)   # optional type_ for automatic casting

expr = age >= threshold                      # created like any other expression

Use collect_params() to inspect placeholders and bind() to substitute them:

from therismos import bind, collect_params

params = collect_params(expr)
# → {"min_age": TemplateParam(name="min_age", type_=<class 'int'>)}

bound = bind(expr, {"min_age": 18})
# → age >= 18  (concrete Eq/Ge expression, no TemplateParam)

Serialization Grammar

TemplateParam nodes serialize as $name or $name{type} in the expression grammar:

from therismos import F, Serializer
from therismos.expr._expr import TemplateParam
import datetime

ser = Serializer(type_registry={datetime.date: "date"})
expr = F("created", datetime.date) >= TemplateParam("start", datetime.date)

print(ser.serialize(expr))           # created{date}>=$start{date}
restored = ser.deserialize("created{date}>=$start{date}")
# → Ge(field=Field("created", date), value=TemplateParam("start", date))

Backend visitors (MongoVisitor, PolarsExprVisitor, PandasExprVisitor) and optimize() raise UnboundTemplateParamError if any TemplateParam remains — call bind() first.

Transform Pipeline DSL

Rules use a $source | step1 | step2(arg) pipeline syntax. Steps are looked up in a TransformRegistry:

$end   = $now | extract_date
$start = $now | extract_date | sub_time(7d)

Duration arguments support: 7d, 1h, 30m, 90s, 500ms.

Built-in transforms include date/time extraction and rounding, arithmetic (add_time, sub_time), type casting (as_date, as_datetime, as_int, …), string ops, and math. Register custom transforms at runtime:

from therismos import DEFAULT_TRANSFORM_REGISTRY

@DEFAULT_TRANSFORM_REGISTRY.register_decorator("fiscal_year_start")
def fiscal_year_start(dt):
    import datetime
    return datetime.date(dt.year if dt.month >= 4 else dt.year - 1, 4, 1)

JSON Persistence

ExprTemplate serializes to a JSON-compatible dict for storage in a database or config file:

import json

d = tmpl.to_dict()
# {
#   "version": "1",
#   "expr": "created>=$start; created<=$end",
#   "params": {"start": {"description": "Range start (inclusive)"}, ...},
#   "rules": {"end": "$now | extract_date", "start": "$now | extract_date | sub_time(7d)"}
# }

json_str = tmpl.to_json()
restored = ExprTemplate.from_json(json_str)
bound = restored.bind({"now": datetime.datetime(2026, 3, 18, 16, 30)})

Module Structure

Therismos is organized into the following modules and submodules:

therismos/
├── __init__.py              # Main package exports
├── expr/                    # Expression module
│   ├── __init__.py          # Expression module exports
│   ├── _expr.py             # Core expression classes (Expr, Field, TemplateParam, etc.)
│   ├── optimizer.py         # Expression optimization and simplification
│   ├── serializer.py        # Grammar-based string serialization/deserialization
│   ├── template.py          # Expression templates (bind, ExprTemplate, RuleSerializer)
│   ├── transforms.py        # Transform registry and built-in transforms
│   └── visitors/            # Visitor implementations package
│       ├── __init__.py      # Core visitor exports
│       ├── _visitors.py     # Built-in visitor implementations
│       ├── mongo.py         # MongoDB query filter converter
│       ├── polars.py        # Polars expression converter
│       └── pandas.py        # Pandas filter callable converter
├── sorting/                 # Sorting module
│   ├── __init__.py          # Sorting module exports
│   ├── _sorting.py          # Core sorting classes (SortOrder, SortCriterion, SortSpec)
│   ├── optimizer.py         # Sort specification optimization
│   ├── serializer.py        # String serialization/deserialization for sort specs
│   └── visitors/            # Visitor implementations package
│       ├── __init__.py      # Core visitor exports
│       ├── _visitors.py     # Built-in visitor implementations
│       ├── mongo.py         # MongoDB sort document converter
│       ├── polars.py        # Polars PolarsSortSpec converter
│       └── pandas.py        # Pandas PandasSortSpec converter
└── grouping/                # Grouping and aggregation module
    ├── __init__.py          # Grouping module exports
    ├── _grouping.py         # Core grouping classes (AggregationFunction, Aggregation, GroupSpec)
    ├── optimizer.py         # Grouping specification optimization
    ├── serializer.py        # String serialization/deserialization for grouping specs
    └── visitors/            # Visitor implementations package
        ├── __init__.py      # Core visitor exports
        ├── _visitors.py     # Built-in visitor implementations
        ├── mongo.py         # MongoDB $group pipeline stage converter
        ├── polars.py        # Polars PolarsGroupSpec converter
        └── pandas.py        # Pandas PandasGroupSpec converter

Core Modules

therismos.expr: Core expression AST implementation
- Expression types: Eq, Ne, Lt, Le, Gt, Ge, Regex, In, IsNull
- Compound expressions: AllExpr, AnyExpr, NotExpr
- Logical constants: TRUE, FALSE
- Field types: Field, F (helper function)
- Visitor protocol: ExprVisitor
- Serialization: Serializer (grammar-based string conversion)
therismos.expr.optimizer: Expression optimization
- optimize(expr, records=None): Optimize an expression tree
- OptimizationRecord: Records of optimization transformations
therismos.expr.serializer: Grammar-based serialization
- Serializer: Converts expressions to/from compact string representations
- URL encoding support for query parameters
- Type annotation control
- Custom type registration
therismos.expr.template: Expression templating
- TemplateParam: Named placeholder for use in value positions of expression nodes
- bind(expr, params): Substitutes template parameters with concrete values
- collect_params(expr): Returns all unbound TemplateParam nodes in an expression
- ExprTemplate: Persistable wrapper combining an expression with parameter specs and computation rules
- RuleSerializer: Serializes/deserializes transform pipeline rules to/from DSL strings
- TemplateParamSpec, ParamRule, TransformStep: Supporting data structures
therismos.expr.transforms: Transform pipeline for computed parameters
- TransformRegistry: Registry of named transform functions
- DEFAULT_TRANSFORM_REGISTRY: Pre-populated registry with 25+ built-in transforms (date/time arithmetic, type coercion, string operations, math)
therismos.expr.visitors: Built-in visitor implementations
- StringVisitor: Converts expressions to human-readable strings
- CountVisitor: Counts nodes in expression trees
- DictVisitor: Converts expressions to dictionary representation
- FieldGathererVisitor: Collects all field names used in an expression
therismos.expr.visitors.mongo: MongoDB backend converter
- MongoVisitor: Converts expressions to MongoDB query filters for PyMongo/Motor
therismos.sorting: Core sorting specification implementation
- Sort orders: SortOrder (NONE, ASCENDING, DESCENDING)
- Sort criterion: SortCriterion (field + order pair)
- Sort specification: SortSpec (list-like collection of criteria)
- Visitor protocols: SortCriterionVisitor, SortSpecVisitor
therismos.sorting.optimizer: Sort specification optimization
- optimize(spec, records=None): Optimize a sort specification
- Removes NONE orders and redundant criteria
- OptimizationRecord: Records of optimization transformations
therismos.sorting.serializer: String serialization
- Serializer: Converts sort specs to/from compact string format
- Format: comma-separated with +/- prefixes ("age,-created_at,+priority")
- Support for field type annotations
- Custom type registration
- Implicit field type mappings
therismos.sorting.visitors: Built-in visitor implementations
- StringVisitor: Converts sort specs to human-readable strings ("age DESC, name ASC")
- DictVisitor: Converts sort specs to dictionary representation
- FieldGathererVisitor: Collects all field names used in a sort spec
therismos.sorting.visitors.mongo: MongoDB backend converter
- MongoVisitor: Converts sort specs to MongoDB sort documents for PyMongo/Motor
therismos.grouping: Core grouping and aggregation specification implementation
- Aggregation functions: AggregationFunction (COUNT, SUM, MIN, MAX, AVERAGE, STDDEV, MEDIAN, Q1, Q3, P01-P99)
- Aggregation: Aggregation (id + function + optional field)
- Grouping specification: GroupSpec (grouping fields + aggregations dict)
- Visitor protocol: GroupSpecVisitor
- Serialization: Serializer (tuple-based string conversion)
therismos.grouping.optimizer: Grouping specification optimization
- optimize(spec, records=None): Optimize a grouping specification
- Removes duplicate grouping fields and aggregation IDs
- OptimizationRecord: Records of optimization transformations
therismos.grouping.serializer: String serialization
- Serializer: Converts grouping specs to/from compact tuple format
- Format: ("field1,field2", "agg1:count,agg2:function:field")
- Validates aggregation function requirements
therismos.grouping.visitors: Built-in visitor implementations
- StringVisitor: Converts grouping specs to tuple-based string format
- DictVisitor: Converts grouping specs to dictionary representation
- FieldGathererVisitor: Collects all field names used in a grouping spec (both grouping and aggregation fields)
therismos.grouping.visitors.mongo: MongoDB backend converter
- MongoVisitor: Converts grouping specs to MongoDB $group aggregation pipeline stages for PyMongo/Motor
- Supports all aggregation functions including percentiles (MongoDB 7.0+)
- Configurable single-field simplification

Development

Requires Python 3.11 or higher.

Setup

# Install dependencies
uv pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check therismos tests

# Run type checking
mypy therismos

# Run all checks with tox
tox

Testing

The project uses pytest with extensive parametrization for comprehensive test coverage:

# Run all tests
pytest

# Run with coverage
pytest --cov=therismos --cov-report=html

# Run specific test file
pytest tests/test_optimizer.py

License

MIT

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
Topic
- Software Development :: Libraries
Typing
- Typed

Release history Release notifications | RSS feed

This version

1.0.0

Mar 29, 2026

0.8.0

Mar 27, 2026

0.6.0

Mar 21, 2026

0.5.0

Feb 25, 2026

0.4.1

Jan 27, 2026

0.4.0

Jan 20, 2026

0.3.0

Nov 23, 2025

0.2.0

Nov 23, 2025

0.1.0

Nov 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

therismos-1.0.0.tar.gz (129.1 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

therismos-1.0.0-py3-none-any.whl (82.1 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file therismos-1.0.0.tar.gz.

File metadata

Download URL: therismos-1.0.0.tar.gz
Upload date: Mar 29, 2026
Size: 129.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for therismos-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`608607bd1f1a0b6d20502d508e40e47d5846411a25762b4c310b62f1de60166c`
MD5	`a2e05e5d61f5920b171155a1eaa37321`
BLAKE2b-256	`ac25ae1b758fbdd744644441d86535865936c284f1d43371d6697e929bad066e`

See more details on using hashes here.

File details

Details for the file therismos-1.0.0-py3-none-any.whl.

File metadata

Download URL: therismos-1.0.0-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 82.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for therismos-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0439837079409dc33536a836222ff0417e81e71b9970d2ca9e6e15e74811cf06`
MD5	`be50934939889ce0644ad8d95f0a81a1`
BLAKE2b-256	`efd513e09dd8f6408622080c4905f52c8abaed9cd9f350e96f9ecb06b95389d8`

See more details on using hashes here.

therismos 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Therismos

Features

Installation

Expressions

Quick Start

Expression Types

Atomic Expressions

Compound Expressions

Type Casting

Expression Equality

Optimization

Basic Examples

Optimization Rules Reference

Atomic Expression Simplifications

NOT Expression Simplifications

AND Expression Simplifications

OR Expression Simplifications

Contradiction Detection (AND)

Between Range Union (OR)

Tautology Detection (OR)

Complex Real-World Example: Detecting Accidental Contradictions

Optimization Tracking

Field Pruning

Polarity-aware substitution under NOT

Expression Evaluation

Basic Evaluation

Evaluation with Type Casting

Multi-Valued Field Evaluation

Evaluating Membership and Regex

Complex Evaluation Examples

Evaluation with Optimized Expressions

Error Handling

Converting to other formats

Custom Visitors

Built-in Visitors

StringVisitor

CountVisitor

DictVisitor

FieldGathererVisitor

Backend Converters

MongoVisitor

Expression Serialization

Core Concepts

Serialization Features

Value Reference

Sorting

Quick Start

Sort Orders

Creating Sort Specifications

Optimization

Converting to Other Formats

Built-in Visitors

MongoDB Sorting

Serialization

Custom Visitors

Grouping and Aggregation

Quick Start

Aggregation Functions

Creating Grouping Specifications

Optimization

Converting to Other Formats

Built-in Visitors

MongoDB Aggregation Pipelines

Serialization

Custom Visitors

Complete Example: Analytics Dashboard

Polars and Pandas Integration

Installation

Polars Integration

Pandas Integration

Expression Templates