Caching bilateral factuality evaluation using generalized truth values
Project description
bilateral-truth: Caching Bilateral Factuality Evaluation
A Python package for bilateral factuality evaluation with generalized truth values and persistent caching.
Overview
This package implements the mathematical function:
ζ_c: ℒ_AT → 𝒱³ × 𝒱³
Where:
- ℒ_AT is the language of assertions
- 𝒱³ represents 3-valued logic components {t, e, f} (true, undefined, false)
- The function returns generalized truth values <u,v> with bilateral evaluation
Key Features
- Bilateral Evaluation: Each assertion receives a generalized truth value <u,v> where u represents verifiability and v represents refutability
- Persistent Caching: The evaluation function maintains a cache to avoid recomputing truth values for previously evaluated assertions
- 3-Valued Logic: Supports true (t), undefined (e), and false (f) truth value components
- Extensible Evaluation: Custom evaluation functions can be provided for domain-specific logic
Installation
From PyPI (Recommended)
# Core package with mock evaluator
pip install bilateral-truth
# With OpenAI support
pip install bilateral-truth[openai]
# With Anthropic (Claude) support
pip install bilateral-truth[anthropic]
# With all LLM providers
pip install bilateral-truth[all]
Development Setup
Option 1: Automated Setup (Recommended)
# Set up virtual environment and install everything
./setup_venv.sh
# Activate the virtual environment
source venv/bin/activate
Option 2: Manual Setup
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install the package in development mode with all dependencies
pip install -e .[all,dev]
Quick Start
from bilateral_truth import Assertion, zeta_c, create_llm_evaluator
# Create an LLM evaluator (requires API key)
evaluator = create_llm_evaluator('openai', model='gpt-4')
# or: evaluator = create_llm_evaluator('anthropic', model='claude-sonnet-4-20250514')
# or: evaluator = create_llm_evaluator('mock') # for testing
# Create assertions
assertion1 = Assertion("The capital of France is Paris")
assertion2 = Assertion("loves", "alice", "bob")
assertion3 = Assertion("It will rain tomorrow")
# Evaluate using ζ_c with LLM-based bilateral assessment
result1 = zeta_c(assertion1, evaluator.evaluate_bilateral)
result2 = zeta_c(assertion2, evaluator.evaluate_bilateral)
result3 = zeta_c(assertion3, evaluator.evaluate_bilateral)
print(f"zeta_c({assertion1}) = {result1}")
print(f"zeta_c({assertion2}) = {result2}")
print(f"zeta_c({assertion3}) = {result3}")
Core Components
Generalized Truth Values
from bilateral_truth import GeneralizedTruthValue, TruthValueComponent
# Classical values using projection
from bilateral_truth import EpistemicPolicy
classical_true = GeneralizedTruthValue(TruthValueComponent.TRUE, TruthValueComponent.FALSE) # <t,f>
classical_false = GeneralizedTruthValue(TruthValueComponent.FALSE, TruthValueComponent.TRUE) # <f,t>
undefined_val = GeneralizedTruthValue(TruthValueComponent.UNDEFINED, TruthValueComponent.UNDEFINED) # <e,e>
# Project to 3-valued logic
projected_true = classical_true.project(EpistemicPolicy.CLASSICAL) # t
projected_false = classical_false.project(EpistemicPolicy.CLASSICAL) # f
projected_undefined = undefined_val.project(EpistemicPolicy.CLASSICAL) # e
# Custom combinations
custom_val = GeneralizedTruthValue(
TruthValueComponent.TRUE,
TruthValueComponent.UNDEFINED
) # <t,e>
Assertions
from bilateral_truth import Assertion
# Simple statement
statement = Assertion("The sky is blue")
# Predicate with arguments
loves = Assertion("loves", "alice", "bob")
# With named arguments
distance = Assertion("distance",
start="NYC",
end="LA",
value=2500,
unit="miles")
# Natural language statements
weather = Assertion("It will rain tomorrow")
fact = Assertion("The capital of France is Paris")
Caching Behavior
The zeta_c function implements the mathematical definition:
zeta_c(φ) = {
c(φ) if φ ∈ dom(c)
ζ(φ) otherwise, and c := c ∪ {(φ, ζ(φ))}
}
from bilateral_truth import zeta_c, get_cache_size, clear_cache
assertion = Assertion("test")
# First evaluation computes and caches
result1 = zeta_c(assertion)
print(f"Cache size: {get_cache_size()}") # 1
# Second evaluation uses cache
result2 = zeta_c(assertion)
print(f"Same result: {result1 == result2}") # True
print(f"Cache size: {get_cache_size()}") # Still 1
LLM-Based Bilateral Evaluation
# Set up environment variables first:
# export OPENAI_API_KEY='your-key'
# export ANTHROPIC_API_KEY='your-key'
from bilateral_truth import zeta_c, create_llm_evaluator, Assertion
# Create real LLM evaluator
openai_evaluator = create_llm_evaluator('openai', model='gpt-4')
claude_evaluator = create_llm_evaluator('anthropic')
# Or use mock evaluator for testing/development
mock_evaluator = create_llm_evaluator('mock')
# The LLM will assess both verifiability and refutability
assertion = Assertion("The Earth is round")
result = zeta_c(assertion, openai_evaluator.evaluate_bilateral)
# The LLM receives a prompt asking it to evaluate:
# 1. Can this statement be verified as true? (verifiability)
# 2. Can this statement be refuted as false? (refutability)
# And returns a structured <u,v> response
API Reference
Functions
zeta(assertion, evaluator): Base bilateral evaluation function (requires LLM evaluator)zeta_c(assertion, evaluator, cache=None): Cached bilateral evaluation functionclear_cache(): Clear the global cacheget_cache_size(): Get the number of cached entriescreate_llm_evaluator(provider, **kwargs): Factory for creating LLM evaluators
Classes
Assertion(statement, *args, **kwargs): Represents natural language assertions or predicatesGeneralizedTruthValue(u, v): Represents <u,v> truth valuesTruthValueComponent: Enum for t, e, f valuesZetaCache: Cache implementation for zeta_cOpenAIEvaluator: LLM evaluator using OpenAI's APIAnthropicEvaluator: LLM evaluator using Anthropic's APIMockLLMEvaluator: Mock evaluator for testing/development
Command Line Interface
After installation, use the bilateral-truth command:
# Install the package first
pip install -e .
# Interactive mode with GPT-4 (requires OPENAI_API_KEY)
bilateral-truth --model gpt-4 --interactive
# Single assertion evaluation with Claude (requires ANTHROPIC_API_KEY)
bilateral-truth --model claude "The capital of France is Paris"
# Use OpenRouter with Llama model (requires OPENROUTER_API_KEY)
bilateral-truth --model llama3-70b "Climate change is real"
# Use mock model for testing (no API key needed)
bilateral-truth --model mock "The sky is blue"
# Use majority voting with 5 samples for more robust results
bilateral-truth --model gpt-4 --samples 5 "Climate change is real"
# Use pessimistic tiebreaking with even number of samples
bilateral-truth --model claude --samples 4 --tiebreak pessimistic "The Earth is round"
# List all available models
bilateral-truth --list-models
# Get information about a specific model
bilateral-truth --model-info gpt-4
Running without installation:
# Use the standalone script
python cli.py -m mock "The Earth is round"
# Interactive mode with sampling
python cli.py -m mock -s 3 --tiebreak random -i
# Single evaluation with majority voting
python cli.py -m llama3 -s 5 "The sky is blue"
# Run the demo
python demo_cli.py
Supported Models
The CLI supports models from multiple providers:
- OpenAI: GPT-4, GPT-3.5-turbo, etc.
- Anthropic: Claude-4 (Opus, Sonnet)
- OpenRouter: Llama, Mistral, Gemini, and many more models
- Mock: For testing and development
API Keys
Set environment variables for the providers you want to use:
export OPENAI_API_KEY='your-openai-key'
export ANTHROPIC_API_KEY='your-anthropic-key'
export OPENROUTER_API_KEY='your-openrouter-key'
Sampling and Majority Voting
The CLI supports robust evaluation using multiple samples and majority voting, as described in the ArXiv paper:
# Single evaluation (default)
python cli.py -m gpt4 "The sky is blue"
# Majority voting with 5 samples for more robust results
python cli.py -m gpt4 -s 5 "Climate change is real"
# Even number of samples with tiebreaking strategies
python cli.py -m claude -s 4 --tiebreak conservative "The Earth is round"
python cli.py -m llama3 -s 6 --tiebreak optimistic "AI will be beneficial"
python cli.py -m mixtral -s 4 --tiebreak random "Democracy is good"
Tiebreaking Strategies:
When multiple samples produce tied votes for a component, the tiebreaking strategy determines the outcome:
-
random(default): Randomly choose among tied components- Unbiased but unpredictable
- Example:
[t,t,f,f]→ randomly picktorf
-
pessimistic: Preferf(cannot verify/refute) when in doubt- Bias toward epistemic caution: "Better to admit uncertainty than make false claims"
- Tends toward
<f,f>(paracomplete/unknown) outcomes - Example:
[t,t,f,f]→ choosef
-
optimistic: Prefert(verified/refuted) when in doubt- Bias toward strong claims: "Give statements the benefit of the doubt"
- Tends toward classical
<t,f>or<f,t>outcomes - Example:
[t,t,f,f]→ chooset
Benefits of Sampling:
- Reduces variance in LLM responses
- More reliable bilateral evaluation results
- Configurable confidence through sample size
- Handles ties systematically with multiple strategies
Examples
Run the included examples:
python llm_examples.py # LLM-based bilateral evaluation examples
python examples.py # Legacy examples (deprecated)
python demo_cli.py # CLI demonstration
Testing
Run the test suite:
python -m pytest tests/
Or run individual test modules:
python -m unittest tests.test_truth_values
python -m unittest tests.test_assertions
python -m unittest tests.test_zeta_function
Mathematical Background
This implementation is based on bilateral factuality evaluation as described in the research paper. The key mathematical concepts include:
- Generalized Truth Values: <u,v> pairs where both components are from {t, e, f} where:
- First position (u): t = verifiable, f = not verifiable, e = undefined
- Second position (v): t = refutable, f = not refutable, e = undefined
- Bilateral Evaluation: Separate assessment of verifiability (u) and refutability (v)
- Persistent Caching: Immutable cache updates maintaining consistency across evaluations
Requirements
- Python 3.9+
- No external dependencies (uses only Python standard library)
License
MIT License
Citation
If you use this implementation in research, please cite the original paper: ArXiv Paper Link
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bilateral_truth-0.2.1.tar.gz.
File metadata
- Download URL: bilateral_truth-0.2.1.tar.gz
- Upload date:
- Size: 35.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca9dcbcdfe7f6dc9d9247b34746379f902e99a91a32b62c26bdc23b8a383020b
|
|
| MD5 |
785dc4cb350b57f4ca76ed2fb73e0bfd
|
|
| BLAKE2b-256 |
04fac8038a39490e949f1e4c6bcf27df4371afb2585adebd76936d5b1bdef6c4
|
Provenance
The following attestation bundles were made for bilateral_truth-0.2.1.tar.gz:
Publisher:
ci.yml on bradleypallen/bilateral-truth
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bilateral_truth-0.2.1.tar.gz -
Subject digest:
ca9dcbcdfe7f6dc9d9247b34746379f902e99a91a32b62c26bdc23b8a383020b - Sigstore transparency entry: 381041015
- Sigstore integration time:
-
Permalink:
bradleypallen/bilateral-truth@270f7c2219e2bd7020d0e8d829bbff52b4cc71e2 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/bradleypallen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@270f7c2219e2bd7020d0e8d829bbff52b4cc71e2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bilateral_truth-0.2.1-py3-none-any.whl.
File metadata
- Download URL: bilateral_truth-0.2.1-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98ae7066d97c567ba5209bc3894432166e90d8363cb8c8e4ccc345ec5bf34759
|
|
| MD5 |
28de744ec88982d9e575c8a12983fb48
|
|
| BLAKE2b-256 |
2b9d9e7ef942fe95f29dfce209ee934b9cbbd36e0b09df5a7127ee197c89e236
|
Provenance
The following attestation bundles were made for bilateral_truth-0.2.1-py3-none-any.whl:
Publisher:
ci.yml on bradleypallen/bilateral-truth
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bilateral_truth-0.2.1-py3-none-any.whl -
Subject digest:
98ae7066d97c567ba5209bc3894432166e90d8363cb8c8e4ccc345ec5bf34759 - Sigstore transparency entry: 381041034
- Sigstore integration time:
-
Permalink:
bradleypallen/bilateral-truth@270f7c2219e2bd7020d0e8d829bbff52b4cc71e2 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/bradleypallen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@270f7c2219e2bd7020d0e8d829bbff52b4cc71e2 -
Trigger Event:
release
-
Statement type: