Caching bilateral factuality evaluation using generalized truth values

These details have not been verified by PyPI

Project links

Project description

bilateral-truth: Caching Bilateral Factuality Evaluation

A Python package for bilateral factuality evaluation with generalized truth values and persistent caching.

Overview

This package implements the mathematical function:

ζ_c: ℒ_AT → 𝒱³ × 𝒱³

Where:

ℒ_AT is the language of assertions
𝒱³ represents 3-valued logic components {t, e, f} (true, undefined, false)
The function returns generalized truth values <u,v> with bilateral evaluation

Key Features

Bilateral Evaluation: Each assertion receives a generalized truth value <u,v> where u represents verifiability and v represents refutability
Persistent Caching: The evaluation function maintains a cache to avoid recomputing truth values for previously evaluated assertions
3-Valued Logic: Supports true (t), undefined (e), and false (f) truth value components
Extensible Evaluation: Custom evaluation functions can be provided for domain-specific logic

Installation

From PyPI (Recommended)

# Core package with mock evaluator
pip install bilateral-truth

# With OpenAI support
pip install bilateral-truth[openai]

# With Anthropic (Claude) support  
pip install bilateral-truth[anthropic]

# With all LLM providers
pip install bilateral-truth[all]

Development Setup

Option 1: Automated Setup (Recommended)

# Set up virtual environment and install everything
./setup_venv.sh

# Activate the virtual environment
source venv/bin/activate

Option 2: Manual Setup

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# Install the package in development mode with all dependencies
pip install -e .[all,dev]

Quick Start

from bilateral_truth import Assertion, zeta_c, create_llm_evaluator

# Create an LLM evaluator (requires API key)
evaluator = create_llm_evaluator('openai', model='gpt-4')
# or: evaluator = create_llm_evaluator('anthropic', model='claude-sonnet-4-20250514')
# or: evaluator = create_llm_evaluator('mock')  # for testing

# Create assertions
assertion1 = Assertion("The capital of France is Paris")
assertion2 = Assertion("loves", "alice", "bob") 
assertion3 = Assertion("It will rain tomorrow")

# Evaluate using ζ_c with LLM-based bilateral assessment
result1 = zeta_c(assertion1, evaluator.evaluate_bilateral)
result2 = zeta_c(assertion2, evaluator.evaluate_bilateral)
result3 = zeta_c(assertion3, evaluator.evaluate_bilateral)

print(f"zeta_c({assertion1}) = {result1}")
print(f"zeta_c({assertion2}) = {result2}")
print(f"zeta_c({assertion3}) = {result3}")

Core Components

Generalized Truth Values

from bilateral_truth import GeneralizedTruthValue, TruthValueComponent

# Classical values using projection
from bilateral_truth import EpistemicPolicy

classical_true = GeneralizedTruthValue(TruthValueComponent.TRUE, TruthValueComponent.FALSE)   # <t,f>
classical_false = GeneralizedTruthValue(TruthValueComponent.FALSE, TruthValueComponent.TRUE) # <f,t>
undefined_val = GeneralizedTruthValue(TruthValueComponent.UNDEFINED, TruthValueComponent.UNDEFINED) # <e,e>

# Project to 3-valued logic
projected_true = classical_true.project(EpistemicPolicy.CLASSICAL)    # t
projected_false = classical_false.project(EpistemicPolicy.CLASSICAL)  # f
projected_undefined = undefined_val.project(EpistemicPolicy.CLASSICAL) # e

# Custom combinations
custom_val = GeneralizedTruthValue(
    TruthValueComponent.TRUE,
    TruthValueComponent.UNDEFINED
)  # <t,e>

Assertions

from bilateral_truth import Assertion

# Simple statement
statement = Assertion("The sky is blue")

# Predicate with arguments  
loves = Assertion("loves", "alice", "bob")

# With named arguments
distance = Assertion("distance", 
                        start="NYC", 
                        end="LA", 
                        value=2500, 
                        unit="miles")

# Natural language statements
weather = Assertion("It will rain tomorrow")
fact = Assertion("The capital of France is Paris")

Caching Behavior

The zeta_c function implements the mathematical definition:

zeta_c(φ) = {
  c(φ)   if φ ∈ dom(c)
  ζ(φ)   otherwise, and c := c ∪ {(φ, ζ(φ))}
}

from bilateral_truth import zeta_c, get_cache_size, clear_cache

assertion = Assertion("test")

# First evaluation computes and caches
result1 = zeta_c(assertion)
print(f"Cache size: {get_cache_size()}")  # 1

# Second evaluation uses cache
result2 = zeta_c(assertion)
print(f"Same result: {result1 == result2}")  # True
print(f"Cache size: {get_cache_size()}")  # Still 1

LLM-Based Bilateral Evaluation

# Set up environment variables first:
# export OPENAI_API_KEY='your-key'
# export ANTHROPIC_API_KEY='your-key'

from bilateral_truth import zeta_c, create_llm_evaluator, Assertion

# Create real LLM evaluator  
openai_evaluator = create_llm_evaluator('openai', model='gpt-4')
claude_evaluator = create_llm_evaluator('anthropic')

# Or use mock evaluator for testing/development
mock_evaluator = create_llm_evaluator('mock')

# The LLM will assess both verifiability and refutability
assertion = Assertion("The Earth is round")
result = zeta_c(assertion, openai_evaluator.evaluate_bilateral)

# The LLM receives a prompt asking it to evaluate:
# 1. Can this statement be verified as true? (verifiability)  
# 2. Can this statement be refuted as false? (refutability)
# And returns a structured <u,v> response

API Reference

Functions

zeta(assertion, evaluator): Base bilateral evaluation function (requires LLM evaluator)
zeta_c(assertion, evaluator, cache=None): Cached bilateral evaluation function
clear_cache(): Clear the global cache
get_cache_size(): Get the number of cached entries
create_llm_evaluator(provider, **kwargs): Factory for creating LLM evaluators

Classes

Assertion(statement, *args, **kwargs): Represents natural language assertions or predicates
GeneralizedTruthValue(u, v): Represents <u,v> truth values
TruthValueComponent: Enum for t, e, f values
ZetaCache: Cache implementation for zeta_c
OpenAIEvaluator: LLM evaluator using OpenAI's API
AnthropicEvaluator: LLM evaluator using Anthropic's API
MockLLMEvaluator: Mock evaluator for testing/development

Command Line Interface

After installation, use the bilateral-truth command:

# Install the package first
pip install -e .

# Interactive mode with GPT-4 (requires OPENAI_API_KEY)
bilateral-truth --model gpt-4 --interactive

# Single assertion evaluation with Claude (requires ANTHROPIC_API_KEY)
bilateral-truth --model claude "The capital of France is Paris"

# Use OpenRouter with Llama model (requires OPENROUTER_API_KEY)
bilateral-truth --model llama3-70b "Climate change is real"

# Use mock model for testing (no API key needed)
bilateral-truth --model mock "The sky is blue"

# Use majority voting with 5 samples for more robust results
bilateral-truth --model gpt-4 --samples 5 "Climate change is real"

# Use pessimistic tiebreaking with even number of samples
bilateral-truth --model claude --samples 4 --tiebreak pessimistic "The Earth is round"

# List all available models
bilateral-truth --list-models

# Get information about a specific model
bilateral-truth --model-info gpt-4

Running without installation:

# Use the standalone script
python cli.py -m mock "The Earth is round"

# Interactive mode with sampling
python cli.py -m mock -s 3 --tiebreak random -i

# Single evaluation with majority voting
python cli.py -m llama3 -s 5 "The sky is blue"

# Run the demo
python demo_cli.py

Supported Models

The CLI supports models from multiple providers:

OpenAI: GPT-4, GPT-3.5-turbo, etc.
Anthropic: Claude-4 (Opus, Sonnet)
OpenRouter: Llama, Mistral, Gemini, and many more models
Mock: For testing and development

API Keys

Set environment variables for the providers you want to use:

export OPENAI_API_KEY='your-openai-key'
export ANTHROPIC_API_KEY='your-anthropic-key'
export OPENROUTER_API_KEY='your-openrouter-key'

Sampling and Majority Voting

The CLI supports robust evaluation using multiple samples and majority voting, as described in the ArXiv paper:

# Single evaluation (default)
python cli.py -m gpt4 "The sky is blue"

# Majority voting with 5 samples for more robust results
python cli.py -m gpt4 -s 5 "Climate change is real"

# Even number of samples with tiebreaking strategies
python cli.py -m claude -s 4 --tiebreak conservative "The Earth is round"
python cli.py -m llama3 -s 6 --tiebreak optimistic "AI will be beneficial"
python cli.py -m mixtral -s 4 --tiebreak random "Democracy is good"

Tiebreaking Strategies:

When multiple samples produce tied votes for a component, the tiebreaking strategy determines the outcome:

random (default): Randomly choose among tied components
- Unbiased but unpredictable
- Example: [t,t,f,f] → randomly pick t or f
pessimistic: Prefer f (cannot verify/refute) when in doubt
- Bias toward epistemic caution: "Better to admit uncertainty than make false claims"
- Tends toward <f,f> (paracomplete/unknown) outcomes
- Example: [t,t,f,f] → choose f
optimistic: Prefer t (verified/refuted) when in doubt
- Bias toward strong claims: "Give statements the benefit of the doubt"
- Tends toward classical <t,f> or <f,t> outcomes
- Example: [t,t,f,f] → choose t

Benefits of Sampling:

Reduces variance in LLM responses
More reliable bilateral evaluation results
Configurable confidence through sample size
Handles ties systematically with multiple strategies

Examples

Run the included examples:

python llm_examples.py    # LLM-based bilateral evaluation examples
python examples.py        # Legacy examples (deprecated)
python demo_cli.py        # CLI demonstration

Testing

Run the test suite:

python -m pytest tests/

Or run individual test modules:

python -m unittest tests.test_truth_values
python -m unittest tests.test_assertions
python -m unittest tests.test_zeta_function

Mathematical Background

This implementation is based on bilateral factuality evaluation as described in the research paper. The key mathematical concepts include:

Generalized Truth Values: <u,v> pairs where both components are from {t, e, f} where:
- First position (u): t = verifiable, f = not verifiable, e = undefined
- Second position (v): t = refutable, f = not refutable, e = undefined
Bilateral Evaluation: Separate assessment of verifiability (u) and refutability (v)
Persistent Caching: Immutable cache updates maintaining consistency across evaluations

Requirements

Python 3.9+
No external dependencies (uses only Python standard library)

License

MIT License

Citation

If you use this implementation in research, please cite the original paper: ArXiv Paper Link

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Aug 11, 2025

0.1.0

Aug 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bilateral_truth-0.2.1.tar.gz (35.0 kB view details)

Uploaded Aug 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bilateral_truth-0.2.1-py3-none-any.whl (22.8 kB view details)

Uploaded Aug 11, 2025 Python 3

File details

Details for the file bilateral_truth-0.2.1.tar.gz.

File metadata

Download URL: bilateral_truth-0.2.1.tar.gz
Upload date: Aug 11, 2025
Size: 35.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bilateral_truth-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`ca9dcbcdfe7f6dc9d9247b34746379f902e99a91a32b62c26bdc23b8a383020b`
MD5	`785dc4cb350b57f4ca76ed2fb73e0bfd`
BLAKE2b-256	`04fac8038a39490e949f1e4c6bcf27df4371afb2585adebd76936d5b1bdef6c4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bilateral_truth-0.2.1.tar.gz:

Publisher: ci.yml on bradleypallen/bilateral-truth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bilateral_truth-0.2.1.tar.gz
- Subject digest: ca9dcbcdfe7f6dc9d9247b34746379f902e99a91a32b62c26bdc23b8a383020b
- Sigstore transparency entry: 381041015
- Sigstore integration time: Aug 11, 2025
Source repository:
- Permalink: bradleypallen/bilateral-truth@270f7c2219e2bd7020d0e8d829bbff52b4cc71e2
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/bradleypallen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@270f7c2219e2bd7020d0e8d829bbff52b4cc71e2
- Trigger Event: release

File details

Details for the file bilateral_truth-0.2.1-py3-none-any.whl.

File metadata

Download URL: bilateral_truth-0.2.1-py3-none-any.whl
Upload date: Aug 11, 2025
Size: 22.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bilateral_truth-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`98ae7066d97c567ba5209bc3894432166e90d8363cb8c8e4ccc345ec5bf34759`
MD5	`28de744ec88982d9e575c8a12983fb48`
BLAKE2b-256	`2b9d9e7ef942fe95f29dfce209ee934b9cbbd36e0b09df5a7127ee197c89e236`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bilateral_truth-0.2.1-py3-none-any.whl:

Publisher: ci.yml on bradleypallen/bilateral-truth

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bilateral_truth-0.2.1-py3-none-any.whl
- Subject digest: 98ae7066d97c567ba5209bc3894432166e90d8363cb8c8e4ccc345ec5bf34759
- Sigstore transparency entry: 381041034
- Sigstore integration time: Aug 11, 2025
Source repository:
- Permalink: bradleypallen/bilateral-truth@270f7c2219e2bd7020d0e8d829bbff52b4cc71e2
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/bradleypallen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@270f7c2219e2bd7020d0e8d829bbff52b4cc71e2
- Trigger Event: release

bilateral-truth 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

bilateral-truth: Caching Bilateral Factuality Evaluation

Overview

Key Features

Installation

From PyPI (Recommended)

Development Setup

Option 1: Automated Setup (Recommended)

Option 2: Manual Setup

Quick Start

Core Components

Generalized Truth Values

Assertions

Caching Behavior

LLM-Based Bilateral Evaluation

API Reference

Functions

Classes

Command Line Interface

Running without installation:

Supported Models

API Keys

Sampling and Majority Voting

Examples

Testing

Mathematical Background

Requirements

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance