Monitoring, Evaluation, Reporting, Inspection, Testing framework for AI systems

These details have not been verified by PyPI

Project description

MERIT: Monitoring, Evaluation, Reporting, Inspection, Testing

A comprehensive framework for evaluating, monitoring, and testing AI systems, particularly those powered by Large Language Models (LLMs). MERIT provides tools for performance monitoring, evaluation metrics, RAG system testing, and comprehensive reporting.

🚀 Features

📊 Monitoring & Observability

Real-time LLM monitoring with customizable metrics
Performance tracking (latency, throughput, error rates)
Cost monitoring and estimation
Usage analytics and token volume tracking
Multi-backend storage (SQLite, MongoDB, file-based)
Live dashboard with interactive metrics

🧪 Evaluation & Testing

RAG system evaluation with comprehensive metrics
LLM performance testing with custom test sets
Automated evaluation using LLM-based evaluators
Test set generation for systematic testing
Multi-model evaluation support

📈 Metrics & Analytics

Correctness, Faithfulness, Relevance for RAG systems
Coherence and Fluency metrics
Context Precision evaluation
Custom metric development framework
Performance benchmarking

🔧 Integration & APIs

Simple 3-line integration for existing applications
REST API for remote monitoring
CLI tools for configuration and execution
Multiple AI provider support (OpenAI, Google, custom)

📦 Installation

Basic Installation

pip install merit-ai

Full Installation with All Dependencies

pip install merit-ai[all]

Development Installation

git clone https://github.com/your-username/merit.git
cd merit
pip install -e .[dev]

🚀 Quick Start

1. Simple Integration (3 Lines!)

from merit.monitoring.service import MonitoringService

# Initialize monitoring
monitor = MonitoringService()

# Log an interaction
monitor.log_simple_interaction({
    'user_message': 'Hello, how are you?',
    'llm_response': 'I am doing well, thank you!',
    'latency': 0.5,
    'model': 'gpt-3.5-turbo'
})

2. RAG System Evaluation

from merit.evaluation.evaluators.rag import RAGEvaluator

# Initialize evaluator
evaluator = RAGEvaluator()

# Evaluate RAG response
results = evaluator.evaluate(
    query="What is machine learning?",
    response="Machine learning is a subset of AI...",
    context=["Document 1 content...", "Document 2 content..."]
)

print(f"Relevance: {results['relevance']}")
print(f"Faithfulness: {results['faithfulness']}")

3. CLI Usage

# Start evaluation with config file
merit start --config my_config.py

# Monitor your application
merit monitor --config monitoring_config.py

📚 Examples

Basic Chat Application Integration

from merit.monitoring.service import MonitoringService
from datetime import datetime

class ChatApp:
    def __init__(self):
        # Initialize MERIT monitoring
        self.monitor = MonitoringService()
    
    def process_message(self, user_message: str) -> str:
        start_time = datetime.now()
        
        # Your existing chat logic here
        response = self.llm_client.chat(user_message)
        
        end_time = datetime.now()
        
        # Log interaction with MERIT
        self.monitor.log_simple_interaction({
            'user_message': user_message,
            'llm_response': response,
            'latency': (end_time - start_time).total_seconds(),
            'model': 'gpt-3.5-turbo',
            'timestamp': end_time.isoformat()
        })
        
        return response

Advanced RAG System with MERIT

from merit.evaluation.evaluators.rag import RAGEvaluator
from merit.monitoring.service import MonitoringService

class RAGSystem:
    def __init__(self):
        self.evaluator = RAGEvaluator()
        self.monitor = MonitoringService()
    
    def query(self, user_question: str):
        # Retrieve relevant documents
        documents = self.retriever.search(user_question)
        
        # Generate response
        response = self.llm.generate(user_question, documents)
        
        # Evaluate with MERIT
        evaluation = self.evaluator.evaluate(
            query=user_question,
            response=response,
            context=[doc.content for doc in documents]
        )
        
        # Monitor performance
        self.monitor.log_simple_interaction({
            'query': user_question,
            'response': response,
            'evaluation_scores': evaluation,
            'num_documents': len(documents)
        })
        
        return response, evaluation

🏗️ Project Structure

merit/
├── api/                    # API clients (OpenAI, Google, etc.)
├── core/                   # Core models and utilities
├── evaluation/             # Evaluation framework
│   ├── evaluators/        # LLM and RAG evaluators
│   └── templates/         # Evaluation templates
├── knowledge/              # Knowledge base management
├── metrics/                # Metrics framework
│   ├── rag.py            # RAG-specific metrics
│   ├── llm_measured.py   # LLM-based metrics
│   └── monitoring.py     # Monitoring metrics
├── monitoring/             # Monitoring service
│   └── collectors/        # Data collectors
├── storage/               # Storage backends
├── templates/             # Dashboard and report templates
└── testset_generation/    # Test set generation tools

📊 Available Metrics

RAG Metrics

Correctness: Accuracy of generated responses
Faithfulness: Adherence to source documents
Relevance: Response relevance to query
Coherence: Logical flow and consistency
Fluency: Natural language quality
Context Precision: Quality of retrieved context

Monitoring Metrics

Latency: Response time tracking
Throughput: Requests per second
Error Rate: Failure percentage
Cost: Token usage and cost estimation
Usage: Model and feature usage patterns

🔧 Configuration

Basic Configuration File

# merit_config.py
from merit.config.models import MeritMainConfig

config = MeritMainConfig(
    evaluation={
        "evaluator": "rag",
        "metrics": ["relevance", "faithfulness", "correctness"]
    },
    monitoring={
        "storage_type": "sqlite",
        "collection_interval": 60,
        "retention_days": 30
    }
)

Advanced Configuration

# advanced_config.py
config = MeritMainConfig(
    evaluation={
        "evaluator": "rag",
        "metrics": ["relevance", "faithfulness", "correctness"],
        "test_set": {
            "path": "test_questions.json",
            "size": 100
        }
    },
    monitoring={
        "storage_type": "mongodb",
        "storage_config": {
            "uri": "mongodb://localhost:27017",
            "database": "merit_metrics"
        },
        "metrics": ["latency", "cost", "error_rate"],
        "collection_interval": 30,
        "retention_days": 90
    },
    knowledge_base={
        "type": "vector_store",
        "path": "./knowledge_base"
    }
)

🎯 Use Cases

1. Production LLM Monitoring

Monitor your deployed LLM applications in real-time with performance metrics, cost tracking, and error monitoring.

2. RAG System Development

Evaluate and improve your RAG systems with comprehensive metrics and automated testing.

3. Model Comparison

Compare different models and configurations using standardized evaluation metrics.

4. Quality Assurance

Implement automated testing for LLM applications with custom test sets and evaluation criteria.

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

git clone https://github.com/your-username/merit.git
cd merit
pip install -e .[dev]
pytest tests/

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with modern Python practices and Pydantic for type safety
Inspired by the need for comprehensive AI system evaluation
Designed for simplicity and ease of integration

📞 Support

Issues: GitHub Issues
Documentation: Full Documentation
Discussions: GitHub Discussions

MERIT: Making AI systems more reliable, one evaluation at a time. 🚀

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.16

Jul 28, 2025

0.1.15

Jun 26, 2025

0.1.14

Jun 26, 2025

0.1.13

Jun 26, 2025

0.1.12

Jun 24, 2025

0.1.11

Jun 24, 2025

0.1.10

Jun 24, 2025

0.1.9

Jun 23, 2025

0.1.8

Jun 20, 2025

0.1.7

Jun 20, 2025

0.1.6

Jun 20, 2025

0.1.5

Jun 20, 2025

0.1.4

Apr 21, 2025

0.1.2

Apr 18, 2025

0.1.1

Apr 18, 2025

0.1.0

Apr 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merit_ai-0.1.16.tar.gz (152.7 kB view details)

Uploaded Jul 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

merit_ai-0.1.16-py3-none-any.whl (176.8 kB view details)

Uploaded Jul 28, 2025 Python 3

File details

Details for the file merit_ai-0.1.16.tar.gz.

File metadata

Download URL: merit_ai-0.1.16.tar.gz
Upload date: Jul 28, 2025
Size: 152.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for merit_ai-0.1.16.tar.gz
Algorithm	Hash digest
SHA256	`5d62ed059ab7e9d9fa000ef3acca1a4e38a019f044fce9bcaff0d5ba555dd487`
MD5	`8d7420fc2775c8b3d0dbd4b366ce7b90`
BLAKE2b-256	`dab50d485e0b0f83abf931ff096021ff2ebc9589aed10ea79acc1b67208fd258`

See more details on using hashes here.

File details

Details for the file merit_ai-0.1.16-py3-none-any.whl.

File metadata

Download URL: merit_ai-0.1.16-py3-none-any.whl
Upload date: Jul 28, 2025
Size: 176.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for merit_ai-0.1.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bedd592a30220ff97e1085bd70d6c31b300dd228db3bcb495add54b8d704eae5`
MD5	`67dda8952193a6be54ad5852a54d1de3`
BLAKE2b-256	`652c8f3456d287ed6e8b3275b405e4eac20b80a804aeaf03e9d129490c3a5496`

See more details on using hashes here.

merit-ai 0.1.16

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

MERIT: Monitoring, Evaluation, Reporting, Inspection, Testing

🚀 Features

📊 Monitoring & Observability

🧪 Evaluation & Testing

📈 Metrics & Analytics

🔧 Integration & APIs

📦 Installation

Basic Installation

Full Installation with All Dependencies

Development Installation

🚀 Quick Start

1. Simple Integration (3 Lines!)

2. RAG System Evaluation

3. CLI Usage

📚 Examples

Basic Chat Application Integration

Advanced RAG System with MERIT

🏗️ Project Structure

📊 Available Metrics

RAG Metrics

Monitoring Metrics

🔧 Configuration

Basic Configuration File

Advanced Configuration

🎯 Use Cases

1. Production LLM Monitoring

2. RAG System Development

3. Model Comparison

4. Quality Assurance

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

📞 Support

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes