Skip to main content

True Lies - Separating truth from AI fiction. A powerful library for detecting LLM hallucinations, validating AI responses, and generating professional HTML reports with interactive dashboards.

Project description

True Lies Validator ๐ŸŽญ

The easiest library to validate LLM and chatbot responses

Validates if your LLM or chatbot is telling the truth, remembering context and maintaining coherence. Perfect for automated conversation testing.

๐Ÿš€ Quick Installation

# Install the library
pip install true-lies-validator

# Verify installation
python -c "from true_lies import ConversationValidator, HTMLReporter; print('โœ… Installed correctly')"

๐Ÿ“ฆ Current version: 0.7.0 - With HTML Reporter, interactive dashboards, and advanced analytics

โšก Get Started in 2 Minutes

1. Basic Validation (1 minute)

from true_lies import ConversationValidator

# Create validator
conv = ConversationValidator()

# Add conversation with automatic reporting
conv.add_turn_and_report(
    user_input="Hello, I'm John, my email is john@company.com",
    bot_response="Hello John! I'll help you with your inquiry.",
    expected_facts={'name': 'John', 'email': 'john@company.com'},
    title="Turn 1: User identifies themselves"
)

# Validate if the bot remembers the context
final_response = "John, your inquiry about john@company.com is resolved"
retention = conv.validate_and_report(
    response=final_response,
    facts_to_check=['name', 'email'],
    title="Retention Test"
)

# Automatic result: โœ… PASS or โŒ FAIL

2. Complete Multi-turn Validation (2 minutes)

from true_lies import ConversationValidator

def test_chatbot_support():
    """Complete support chatbot test"""
    
    # Create validator
    conv = ConversationValidator()
    
    # Turn 1: User reports problem
    conv.add_turn_and_report(
        user_input="My app doesn't work, I'm user ID 12345",
        bot_response="Hello, I'll help you. What error do you see?",
        expected_facts={'user_id': '12345', 'issue_type': 'app_not_working'},
        title="Turn 1: User reports problem"
    )
    
    # Turn 2: User provides details
    conv.add_turn_and_report(
        user_input="Error 500 on login, email john@company.com",
        bot_response="I understand, error 500 on login. Checking your account.",
        expected_facts={'error_code': '500', 'email': 'john@company.com'},
        title="Turn 2: User provides details"
    )
    
    # Show conversation summary
    conv.print_conversation_summary("Conversation Summary")
    
    # Final test: Does the bot remember everything?
    final_response = "John (ID 12345), your error 500 will be fixed in 2 hours"
    retention = conv.validate_and_report(
        response=final_response,
        facts_to_check=['user_id', 'error_code', 'email'],
        title="Context Retention Test"
    )
    
    # Return result for automated tests
    return retention['retention_score'] >= 0.8

# Run test
if __name__ == "__main__":
    test_chatbot_support()

๐ŸŽฏ Popular Use Cases

E-commerce

# Customer buying product
conv.add_turn_and_report(
    user_input="Hello, I'm Maria, I want to buy a laptop for $1500",
    bot_response="Hello Maria! I'll help you with the laptop. Registered email: maria@store.com",
    expected_facts={'customer_name': 'Maria', 'product': 'laptop', 'budget': '1500'},
    title="Turn 1: Customer identifies themselves"
)

Banking

# Customer requesting loan
conv.add_turn_and_report(
    user_input="I'm Carlos, I work at TechCorp, I earn $95,000, I want a loan",
    bot_response="Hello Carlos! I'll help you with your loan. Email: carlos@bank.com",
    expected_facts={'customer_name': 'Carlos', 'employer': 'TechCorp', 'income': '95000'},
    title="Turn 1: Customer requests loan"
)

Technical Support

# User reports problem
conv.add_turn_and_report(
    user_input="My app doesn't work, I'm user ID 12345",
    bot_response="Hello, I'll help you. What error do you see?",
    expected_facts={'user_id': '12345', 'issue_type': 'app_not_working'},
    title="Turn 1: User reports problem"
)

๐Ÿ”ง Main Methods

add_turn_and_report() - Add turn with automatic reporting

conv.add_turn_and_report(
    user_input="...",
    bot_response="...",
    expected_facts={'key': 'value'},
    title="Turn description"
)

validate_and_report() - Validate retention with automatic reporting

retention = conv.validate_and_report(
    response="Bot response to validate",
    facts_to_check=['fact1', 'fact2'],
    title="Retention Test"
)

print_conversation_summary() - Conversation summary

conv.print_conversation_summary("Conversation Summary")

๐Ÿ“Š Supported Fact Types

The library automatically detects these types of information:

  • Names: "John", "Maria Gonzalez"
  • Emails: "john@company.com", "maria@store.com"
  • Phones: "+1-555-123-4567", "(555) 123-4567"
  • IDs: "12345", "USER-001", "POL-2024-001"
  • Amounts: "$1,500", "1500", "USD 1500"
  • Employers: "TechCorp", "Google Inc", "Microsoft"
  • Dates: "2024-12-31", "31/12/2024", "December 31, 2024"
  • Percentages: "15%", "15 percent", "fifteen percent"

๐ŸŽจ Automatic Reporting

True Lies handles all the reporting. You only need 3 lines:

# Before (30+ lines of manual code)
print(f"๐Ÿ“Š Detailed results:")
for fact in facts:
    retained = retention.get(f'{fact}_retained', False)
    # ... 25 more lines of manual prints

# After (3 simple lines)
retention = conv.validate_and_report(
    response=final_response,
    facts_to_check=['fact1', 'fact2'],
    title="Retention Test"
)

๐Ÿ“Š HTML Reports & Dashboard

Generate professional HTML reports with interactive dashboards in just one line:

๐Ÿš€ Super Simple HTML Reports

from true_lies import validate_llm_candidates

# Define your test scenario
scenario = {
    "name": "Insurance Policy Test",
    "semantic_reference": "Your auto insurance policy #POL-2024-001 has a premium of $850.00",
    "facts": {
        "policy_number": {"expected": "POL-2024-001", "extractor": "regex", "pattern": r"#?(POL-\d{4}-\d{3})"},
        "premium_amount": {"expected": "850.00", "extractor": "money"},
        "insurance_type": {"expected": "auto insurance", "extractor": "categorical", "patterns": {"auto insurance": ["auto insurance", "car insurance"]}}
    }
}

# Test multiple candidates
candidates = [
    "Your auto insurance policy #POL-2024-001 has a premium of $850.00",
    "Auto insurance policy POL-2024-001 costs $850.00",
    "Policy #POL-2024-001: $850.00 for auto insurance"
]

# Generate HTML report with ONE line! ๐ŸŽ‰
result = validate_llm_candidates(
    scenario=scenario,
    candidates=candidates,
    threshold=0.65,
    generate_html_report=True,  # โ† This generates the report!
    html_title="Insurance Policy Validation Report"
)

print(f"๐Ÿ“Š Report saved to: {result['html_report_path']}")

๐ŸŽจ Interactive Dashboard Features

๐Ÿ“ˆ Real-time Analytics:

  • Success Rate Distribution - Centered chart showing pass/fail distribution
  • Performance Trend - Historical performance with configurable target line
  • Similarity Score Trend - Semantic similarity tracking over time
  • Fact Retention Trend - Percentage of facts retained across tests

๐Ÿ” Interactive Table:

  • Sortable columns - Click headers to sort by ID, Score, Status, etc.
  • Expandable details - Click "View Details" to see full test information
  • Card-style details - Professional styling with smooth transitions
  • Real-time filtering - Filter and search through results

๐Ÿ“Š Historical Data:

  • Automatic data persistence - Results saved to true_lies_reporting/validation_history.json
  • Temporal analysis - Track performance over days/weeks/months
  • Target control - Set and adjust performance targets dynamically
  • Trend visualization - See improvement patterns over time

๐ŸŽฏ Key Benefits

  • โœ… One-line report generation - No complex setup required
  • โœ… Automatic data persistence - Historical tracking built-in
  • โœ… Interactive dashboards - Professional charts and visualizations
  • โœ… Real-time sorting - Click to sort any column
  • โœ… Expandable details - Toggle detailed test information
  • โœ… Responsive design - Works on desktop and mobile
  • โœ… Professional styling - Ready for stakeholder presentations

๐Ÿ’ก Real-World Example: E-commerce Order Processing

from true_lies import validate_llm_candidates

# E-commerce order scenario
scenario = {
    "name": "Order Processing Test",
    "semantic_reference": "Order #ORD-2024-789 for John Smith (john@email.com) - 2x Laptop ($1,200 each) = $2,400 total",
    "facts": {
        "order_id": {"expected": "ORD-2024-789", "extractor": "regex", "pattern": r"#?(ORD-\d{4}-\d{3})"},
        "customer_name": {"expected": "John Smith", "extractor": "regex", "pattern": r"for\s+([A-Za-z\s]+)\s+\("},
        "customer_email": {"expected": "john@email.com", "extractor": "email"},
        "product_quantity": {"expected": "2", "extractor": "regex", "pattern": r"(\d+)x\s+Laptop"},
        "product_name": {"expected": "Laptop", "extractor": "regex", "pattern": r"\d+x\s+([A-Za-z]+)"},
        "unit_price": {"expected": "1200", "extractor": "money"},
        "total_amount": {"expected": "2400", "extractor": "money"}
    }
}

# Test various order processing responses
candidates = [
    "Order #ORD-2024-789 confirmed for John Smith (john@email.com) - 2x Laptop at $1,200 each = $2,400 total",
    "John Smith's order ORD-2024-789: 2 Laptops for $1,200 each, total $2,400. Email: john@email.com",
    "Order ORD-2024-789 processed. Customer: John Smith, 2x Laptop, $1,200 per unit, $2,400 total. Contact: john@email.com"
]

# Generate comprehensive report
result = validate_llm_candidates(
    scenario=scenario,
    candidates=candidates,
    threshold=0.70,
    generate_html_report=True,
    html_title="E-commerce Order Processing Validation"
)

print(f"๐Ÿ“Š Order processing report: {result['html_report_path']}")
print(f"โœ… Overall accuracy: {result['summary']['overall_accuracy']:.1%}")

๐Ÿš€ CI/CD Integration

Perfect for automated testing pipelines:

# GitHub Actions example
- name: Run LLM Validation Tests
  run: |
    python -c "
    from true_lies import validate_llm_candidates
    scenario = {'name': 'CI Test', 'semantic_reference': 'Test data', 'facts': {}}
    candidates = ['Test response 1', 'Test response 2']
    result = validate_llm_candidates(scenario, candidates, generate_html_report=True)
    print(f'Report: {result[\"html_report_path\"]}')
    "

- name: Upload Validation Report
  uses: actions/upload-artifact@v4
  with:
    name: llm-validation-report
    path: "validation_report_*.html"

๐Ÿ“ˆ Automatic Metrics

  • Retention Score: 0.0 - 1.0 (how well it remembers)
  • Facts Retained: X/Y facts remembered
  • Evaluation: A, B, C, D, F (automatic grading)
  • Details per Fact: What was found and what wasn't

๐Ÿš€ Complete Examples

Example 1: Support Chatbot

from true_lies import ConversationValidator

def test_support_chatbot():
    conv = ConversationValidator()
    
    # Turn 1: User reports problem
    conv.add_turn_and_report(
        user_input="My app doesn't work, I'm user ID 12345",
        bot_response="Hello, I'll help you. What error do you see?",
        expected_facts={'user_id': '12345', 'issue_type': 'app_not_working'},
        title="Turn 1: User reports problem"
    )
    
    # Turn 2: User provides details
    conv.add_turn_and_report(
        user_input="Error 500 on login, email john@company.com",
        bot_response="I understand, error 500 on login. Checking your account.",
        expected_facts={'error_code': '500', 'email': 'john@company.com'},
        title="Turn 2: User provides details"
    )
    
    # Final test
    final_response = "John (ID 12345), your error 500 will be fixed in 2 hours"
    retention = conv.validate_and_report(
        response=final_response,
        facts_to_check=['user_id', 'error_code', 'email'],
        title="Context Retention Test"
    )
    
    return retention['retention_score'] >= 0.8

if __name__ == "__main__":
    test_support_chatbot()

Example 2: E-commerce

from true_lies import ConversationValidator

def test_ecommerce_chatbot():
    conv = ConversationValidator()
    
    # Turn 1: Customer identifies themselves
    conv.add_turn_and_report(
        user_input="Hello, I'm Maria Gonzalez, email maria@store.com, I want to buy a laptop",
        bot_response="Hello Maria! I'll help you with the laptop. Registered email: maria@store.com",
        expected_facts={'customer_name': 'Maria Gonzalez', 'email': 'maria@store.com', 'product_interest': 'laptop'},
        title="Turn 1: Customer identifies themselves"
    )
    
    # Turn 2: Customer specifies budget
    conv.add_turn_and_report(
        user_input="My budget is $1500, I need it for programming",
        bot_response="Perfect Maria, we have laptops for programming in that range. I'll send options to maria@store.com",
        expected_facts={'budget': '1500', 'use_case': 'programming'},
        title="Turn 2: Customer specifies budget"
    )
    
    # Final test
    final_response = "Maria, your programming laptop for $1500 is ready. I'll send the invoice to maria@store.com"
    retention = conv.validate_and_report(
        response=final_response,
        facts_to_check=['customer_name', 'email', 'budget', 'use_case'],
        title="E-commerce Retention Test"
    )
    
    return retention['retention_score'] >= 0.8

if __name__ == "__main__":
    test_ecommerce_chatbot()

๐Ÿ” Advanced Validation (Optional)

For more complex cases, you can also use traditional validation:

from true_lies import create_scenario, validate_llm_candidates

# Facts that MUST be in the response
facts = {
    'policy_number': {'extractor': 'categorical', 'expected': 'POL-2024-001'},
    'premium': {'extractor': 'money', 'expected': '850.00'},
    'coverage_type': {'extractor': 'categorical', 'expected': 'auto insurance'}
}

# Reference text for semantic comparison
reference_text = "Your auto insurance policy #POL-2024-001 has a premium of $850.00"

# Create scenario (with automatic fact weighting)
scenario = create_scenario(
    facts=facts,
    semantic_reference=reference_text,
    semantic_mappings={}  # Weights are applied automatically
)

# Validate responses
candidates = [
    "Policy POL-2024-001 covers your automobile with monthly payments of $850.00",
    "Your car insurance policy POL-2024-001 costs $850 monthly"
]

results = validate_llm_candidates(
    scenario=scenario,
    candidates=candidates,
    threshold=0.7
)

๐ŸŽฏ Advanced Features

Automatic Fact Weighting:

  • Values in your expected facts are automatically weighted
  • Significant improvement in similarity scores (+55% in typical cases)
  • No additional configuration needed

Improved Polarity Detection:

  • Correctly detects negative phrases with "not", "does not", "don't", etc.
  • Patterns in English and Spanish
  • Avoids false positives with substrings

Optimized Semantic Mappings:

  • Use simple and specific mappings
  • Avoid over-mapping that can worsen scores
  • Recommendation: minimal mappings or no mappings

๐Ÿ’ก Best Practices

1. Fact Configuration:

# โœ… CORRECT - For specific numbers
'account_number': {'extractor': 'number', 'expected': '2992'}

# โŒ INCORRECT - For specific numbers
'account_number': {'extractor': 'categorical', 'expected': '2992'}

# โœ… CORRECT - For categories
'account_type': {'extractor': 'categorical', 'expected': 'savings'}

2. Semantic Mappings:

# โœ… CORRECT - Simple mappings
semantic_mappings = {
    "account": ["cuenta"],
    "balance": ["saldo", "monto"]
}

# โŒ INCORRECT - Excessive mappings
semantic_mappings = {
    "phrases": ["the balance of your", "your term deposit account", ...]  # Too aggressive
}

3. Thresholds:

  • 0.6-0.7: For strict validation
  • 0.5-0.6: For permissive validation
  • 0.8+: Only for exact cases

๐ŸŽฏ Available Extractors

  • money: Monetary values ($1,234.56, USD 27, 100 dollars) - Improved v0.6.2+
  • number: General numbers (25, 3.14, 1000)
  • categorical: Categorical values with synonyms - Improved v0.6.2+
  • email: Email addresses
  • phone: Phone numbers
  • hours: Time schedules (9:00 AM, 14:30, 3:00 PM)
  • id: Identifiers (USER-001, POL-2024-001)
  • regex: Custom patterns

๐Ÿ”ง Extractor Improvements (v0.6.2+)

Improved money extractor:

  • Prioritizes amounts with currency symbols ($, USD, dollars)
  • Avoids capturing non-monetary numbers
  • Better accuracy in banking scenarios

Improved categorical extractor:

  • Whole word matches (avoids false positives)
  • Better detection of specific patterns
  • Compatible with exact expected values

๐Ÿ“š Complete Documentation

๐ŸŽฏ Examples & Demos

HTML Reporter Examples

CI/CD Integration Examples

๐Ÿ› ๏ธ Diagnostic Tools

Diagnostic Tool

To diagnose similarity and extraction issues:

from diagnostic_tool import run_custom_diagnosis

# Your configuration
fact_configs = {
    'account_number': {'extractor': 'number', 'expected': '2992'},
    'balance_amount': {'extractor': 'money', 'expected': '3,000.60'}
}
candidates = ["Your account 2992 has $3,000.60"]

# Diagnose
run_custom_diagnosis(
    text="The balance of your Term Deposit account 2992 is $3,000.60",
    fact_configs=fact_configs,
    candidates=candidates
)

๐Ÿ”„ Changelog

v0.7.0 (Current)

  • โœ… NEW: HTML Reporter - Professional HTML reports with interactive dashboards
  • โœ… NEW: Interactive Charts - Chart.js integration for visual analytics
  • โœ… NEW: Advanced Filtering - Real-time search and filtering capabilities
  • โœ… NEW: Temporal Analysis - Daily/Weekly/Monthly performance tracking
  • โœ… NEW: PDF Export - High-quality PDF reports with full formatting
  • โœ… NEW: CI/CD Integration - GitHub Actions, Jenkins, GitLab CI support
  • โœ… NEW: Detailed Test Information - User input, bot response, expected response comparison
  • โœ… NEW: Responsive Design - Mobile-friendly professional interface

v0.6.4

  • โœ… Improved polarity detection (detects "not", "does not", etc.)
  • โœ… Complete negative patterns in English and Spanish
  • โœ… Avoids false positives with substrings

v0.6.3

  • โœ… Duplicate function removed
  • โœ… Consistent API
  • โœ… Clean code

v0.6.2

  • โœ… Automatic fact weighting
  • โœ… Improved similarity (+55% in typical cases)
  • โœ… Improved money extractor
  • โœ… English reporting

๐Ÿค Contributing

Contributions are welcome! Please:

  1. Fork the project
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • NLTK for natural language processing capabilities
  • The open source community for inspiration and feedback

True Lies - Where AI meets reality ๐ŸŽญ

Have questions? Open an issue or contact the development team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

true_lies_validator-0.8.0.tar.gz (55.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

true_lies_validator-0.8.0-py3-none-any.whl (46.5 kB view details)

Uploaded Python 3

File details

Details for the file true_lies_validator-0.8.0.tar.gz.

File metadata

  • Download URL: true_lies_validator-0.8.0.tar.gz
  • Upload date:
  • Size: 55.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for true_lies_validator-0.8.0.tar.gz
Algorithm Hash digest
SHA256 3c93ce76f17d17789735a8d2eecbecbd96643e33f59d14b872126d6f0f8fc782
MD5 4d327f19ba187cfe11270fa58730e613
BLAKE2b-256 61cddad2758c3ce4f7ea213e5f54ceb41fb5b920aff059d9d0b105822e9458f6

See more details on using hashes here.

File details

Details for the file true_lies_validator-0.8.0-py3-none-any.whl.

File metadata

File hashes

Hashes for true_lies_validator-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f86a86d06b69ecc1947e697d6e9f52ca611e326df991a1f10af66dd95a46c303
MD5 247a0ba0024e6e2f82765716f1a80390
BLAKE2b-256 93c2b46ae9f0e2c8f776aff0e3fd81217461b92a2a7e6b413a0b1f70534dfee0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page