A library for evaluating LLM-based applications.
Project description
GLLM Evaluator SDK
A comprehensive evaluation framework for Generative AI applications including LLM outputs, AI Agent responses, and RAG (Retrieval-Augmented Generation) systems.
Overview
The GLLM Evaluator SDK provides a robust, extensible framework designed to make AI evaluation as simple and seamless as possible across the GDP Labs ecosystem. Built with integration-first philosophy, it enables teams to easily assess the quality of generated content from any AI system while seamlessly connecting with experiment tracking and observability platforms.
Philosophy
Easy Evaluation Everywhere: Standardize evaluation practices across all GDP Labs AI applications with minimal setup and maximum flexibility.
Integration-First Design: Built to work seamlessly with your existing experiment tracking, observability, and MLOps infrastructure.
Extensible by Design: Add new evaluators, metrics, and integrations without breaking existing workflows.
Key Features
- 🌐 GDP Labs Ecosystem Ready: Standardized evaluation framework across all internal AI applications
- 🔌 Seamless Integration: Easy integration with experiment tracking and observability platforms
- 🚀 Async-First Design: High-performance async evaluation with parallel processing
- 🔧 Extensible Architecture: Easy to add new evaluators and metrics for any use case
- 🤖 LLM as a Judge: Advanced language models for nuanced, contextual evaluation
- 📐 Traditional Metrics: Support for classical evaluation metrics and custom scoring functions
- 🔗 Popular Evaluator Integration: Integration with popular evaluators such as RAGAS, DeepEval, and LangChain
- ⚡ Zero-Config Start: Get started with sensible defaults, customize as needed
Installation
Prerequisites
Mandatory:
- Python 3.11+ — Install here
- pip — Install here
- uv — Install here
- gcloud CLI (for authentication) — Install here, then log in using:
gcloud auth login
Install from Artifact
Because gllm-evals is a private library hosted in a secure Google Cloud repository, you must provide an access token to install it. The command below handles this authorization inline by using an access token from the gcloud CLI.
uv pip install --extra-index-url https://oauth2accesstoken:$(gcloud auth print-access-token)@glsdk.gdplabs.id/gen-ai-internal/simple/ gllm-evals
Local Development Setup
Prerequisites
-
Python 3.11+ — Install here
-
pip — Install here
-
uv — Install here
-
gcloud CLI — Install here, then log in using:
gcloud auth login
-
Git — Install here
-
Access to the GDP Labs SDK GitHub repository
1. Clone Repository
git clone git@github.com:GDP-ADMIN/gl-sdk.git
cd gl-sdk/libs/gllm-evals
2. Setup Authentication
Because gllm-evals is a private library, you first need to configure uv to authenticate with our secure Google Cloud repositories.
Set the following environment variables to authenticate with internal package indexes:
export UV_INDEX_GEN_AI_INTERNAL_USERNAME=oauth2accesstoken
export UV_INDEX_GEN_AI_INTERNAL_PASSWORD="$(gcloud auth print-access-token)"
3. Quick Setup
Run:
make setup
4. Activate Virtual Environment
source .venv/bin/activate
Local Development Utilities
The following Makefile commands are available for quick operations:
Install uv
make install-uv
Install Pre-Commit
make install-pre-commit
Install Dependencies
make install
Update Dependencies
make update
Run Tests
make test
Adding the Package
Once authorization is configured, you can add gllm-evals to your project:
uv add gllm-evals
Dependencies
The SDK requires:
gllm-coreandgllm-inferencefor LLM interactionspydanticfor data validation
Quick Start
Basic Usage
import asyncio
import os
from gllm_evals.evaluator.geval_generation_evaluator import GEvalGenerationEvaluator
async def main():
# Initialize the evaluator
evaluator = GEvalGenerationEvaluator(
model_credentials=os.getenv("GOOGLE_API_KEY")
)
# Prepare evaluation data
data = {
"query": "What is the capital of France?",
"expected_response": "Paris is the capital of France.",
"generated_response": "The capital of France is Paris.",
"retrieved_context": "Paris is the capital and largest city of France."
}
# Evaluate
result = await evaluator.evaluate(data)
print(result)
if __name__ == "__main__":
asyncio.run(main())
Batch Evaluation
import asyncio
import os
from gllm_evals.dataset.dict_dataset import DictDataset
from gllm_evals.evaluator.geval_generation_evaluator import GEvalGenerationEvaluator
from gllm_evals.runner import Runner
from gllm_evals.experiment_tracker.csv_experiment_tracker import CSVExperimentTracker
async def batch_evaluation():
# Initialize evaluator
evaluator = GEvalGenerationEvaluator(
model_credentials=os.getenv("GOOGLE_API_KEY"),
run_parallel=True # Enable parallel processing
)
# Create dataset
dataset = DictDataset([
{
"query": "What is the capital of France?",
"expected_response": "Paris",
"generated_response": "Paris is the capital of France.",
"retrieved_context": "Paris is the capital of France."
},
{
"query": "What is 1 + 1?",
"expected_response": "2",
"generated_response": "The answer is 2.",
"retrieved_context": "1 + 1 equals 2."
}
])
# Run evaluation
runner = Runner(evaluator, batch_size=10)
results = await runner.evaluate(dataset)
# Track results
tracker = CSVExperimentTracker(score_key="generation/score")
tracker.log_batch(results)
print(f"Evaluation Results: {tracker.get_results()}")
if __name__ == "__main__":
asyncio.run(batch_evaluation())
Custom Metrics
Create domain-specific metrics easily:
from gllm_evals.metrics.metric import BaseMetric
from gllm_evals.types import MetricInput, MetricOutput
class DomainSpecificMetric(BaseMetric):
"""Custom metric for domain-specific evaluation."""
name = "domain_accuracy"
async def _evaluate(self, data: MetricInput) -> MetricOutput:
# Your domain-specific evaluation logic
score = self.calculate_domain_score(data)
return {"score": score, "explanation": "Domain-specific reasoning"}
Architecture
Core Components
1. Evaluators
BaseEvaluator: Abstract base class for all evaluators - extend for any evaluation scenarioGEvalGenerationEvaluator: Production-ready GEval-backed evaluator for text generation quality with rule-based scoring
2. Metrics
BaseMetric: Abstract base class for metrics - create custom metrics for any domainLMBasedMetric: Generic LM-powered metric evaluation with customizable prompts
3. Datasets
BaseDataset: Abstract base class for datasets - support any data formatDictDataset: Simple dictionary-based dataset implementation
4. Runner
Runner: Runner class for batch evaluation
Metrics
Below is a list of metrics that are currently supported by the SDK.
| Metric | Description | Type | Score Range |
|---|---|---|---|
| LMBasedMetric | An all purpose metric that can be used to evaluate any metric that can be expressed as a LM prompt | LM-based | - |
| DeepEvalGEvalMetric | A versatile evaluation metric framework that can be used to create custom evaluation metrics with configurable criteria, evaluation steps, and rubrics | LM-based | - |
| GEvalCompletenessMetric | A metric that can be used to evaluate the completeness of the generated output | DeepEval GEval | 1-3 |
| GEvalRedundancyMetric | A metric that can be used to evaluate the redundancy of the generated output | DeepEval GEval | 1-3 |
| GEvalGroundednessMetric | A metric that can be used to evaluate the groundedness of the generated output | DeepEval GEval | 1-3 |
| GEvalLanguageConsistencyMetric | A metric that can be used to evaluate language consistency between query and generated response | DeepEval GEval | 0-1 |
| GEvalRefusalMetric | A metric that can be used to evaluate refusal behavior from query and expected response | DeepEval GEval | 0-1 |
| GEvalRefusalAlignmentMetric | A metric that can be used to evaluate refusal alignment between expected and generated responses | DeepEval GEval | 0-1 |
Evaluators
Below is a list of evaluators that are currently supported by the SDK.
| Evaluator | Description | Type |
|---|---|---|
| GEvalGenerationEvaluator | An evaluator that can be used to evaluate the quality of the generated output | LLM-based |
Datasets
Below is a list of datasets that are currently supported by the SDK.
| Dataset | Description |
|---|---|
| DictDataset | A dataset that loads data from a dictionary |
| HuggingFaceDataset | A dataset that loads data from a HuggingFace dataset |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gllm_evals_binary-0.1.10.post1-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: gllm_evals_binary-0.1.10.post1-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64c2e488832ca317bc8179cbb6f8b4a0f99ad6223ef71a7b520423870aba29ca
|
|
| MD5 |
9c09d6b1aa92835d8d14129a3d6ff8a1
|
|
| BLAKE2b-256 |
aa545b0a0e7373e6afbff9766098dfa05d522e28c267a7e87a2581407b82d30e
|
Provenance
The following attestation bundles were made for gllm_evals_binary-0.1.10.post1-cp313-cp313-win_amd64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_evals_binary-0.1.10.post1-cp313-cp313-win_amd64.whl -
Subject digest:
64c2e488832ca317bc8179cbb6f8b4a0f99ad6223ef71a7b520423870aba29ca - Sigstore transparency entry: 1633008732
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Branch / Tag:
refs/tags/gllm_evals-v0.1.10.post1 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file gllm_evals_binary-0.1.10.post1-cp313-cp313-manylinux_2_31_x86_64.whl.
File metadata
- Download URL: gllm_evals_binary-0.1.10.post1-cp313-cp313-manylinux_2_31_x86_64.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.13, manylinux: glibc 2.31+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eed4e89a4f835725ba4d4a5cf737f313651b02aa8c7230266d1dc0e3ab81b821
|
|
| MD5 |
313a004d16e86d99eef4621ab3cb10b5
|
|
| BLAKE2b-256 |
85d99bb0638a7b6ff943251fd516b1a8bf1bb4eb2d94733829b66c2c54b09574
|
File details
Details for the file gllm_evals_binary-0.1.10.post1-cp313-cp313-macosx_13_0_arm64.whl.
File metadata
- Download URL: gllm_evals_binary-0.1.10.post1-cp313-cp313-macosx_13_0_arm64.whl
- Upload date:
- Size: 2.3 MB
- Tags: CPython 3.13, macOS 13.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee42544dcc0128f971d90e5b3da559c2f7fc4e9afb2c6e3bd7f6f79a668e9f00
|
|
| MD5 |
5b649bfdabcac45a72012870e25df247
|
|
| BLAKE2b-256 |
dd1fcea9757dbd3974614248054bef1db61916e4ce752110809e923b932905a4
|
Provenance
The following attestation bundles were made for gllm_evals_binary-0.1.10.post1-cp313-cp313-macosx_13_0_arm64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_evals_binary-0.1.10.post1-cp313-cp313-macosx_13_0_arm64.whl -
Subject digest:
ee42544dcc0128f971d90e5b3da559c2f7fc4e9afb2c6e3bd7f6f79a668e9f00 - Sigstore transparency entry: 1633008513
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Branch / Tag:
refs/tags/gllm_evals-v0.1.10.post1 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file gllm_evals_binary-0.1.10.post1-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: gllm_evals_binary-0.1.10.post1-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55df52b0390c957c02d16f420ec0653246b3d3d55345ca1d63b0eec9c0e177ec
|
|
| MD5 |
167c24988d359fc2d78c878ef263d38b
|
|
| BLAKE2b-256 |
86559ad583b9e46f08bad61a7448cdcde0e224ee78062ecb2de165bef0d45da4
|
Provenance
The following attestation bundles were made for gllm_evals_binary-0.1.10.post1-cp312-cp312-win_amd64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_evals_binary-0.1.10.post1-cp312-cp312-win_amd64.whl -
Subject digest:
55df52b0390c957c02d16f420ec0653246b3d3d55345ca1d63b0eec9c0e177ec - Sigstore transparency entry: 1633008664
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Branch / Tag:
refs/tags/gllm_evals-v0.1.10.post1 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file gllm_evals_binary-0.1.10.post1-cp312-cp312-macosx_13_0_arm64.whl.
File metadata
- Download URL: gllm_evals_binary-0.1.10.post1-cp312-cp312-macosx_13_0_arm64.whl
- Upload date:
- Size: 2.3 MB
- Tags: CPython 3.12, macOS 13.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3876bc8823f65e9fd281adfd79085b52fbca461bfe87cba5a1b364bb36a8663
|
|
| MD5 |
abe13fda92a9e148a0931d7eb9d568ea
|
|
| BLAKE2b-256 |
063d85e639d414846c48caee90c5c2fb84f0c3c196dd9810aa3f605613ff33cd
|
Provenance
The following attestation bundles were made for gllm_evals_binary-0.1.10.post1-cp312-cp312-macosx_13_0_arm64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_evals_binary-0.1.10.post1-cp312-cp312-macosx_13_0_arm64.whl -
Subject digest:
f3876bc8823f65e9fd281adfd79085b52fbca461bfe87cba5a1b364bb36a8663 - Sigstore transparency entry: 1633008655
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Branch / Tag:
refs/tags/gllm_evals-v0.1.10.post1 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file gllm_evals_binary-0.1.10.post1-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: gllm_evals_binary-0.1.10.post1-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 2.1 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1f1e856926644f018ea1d9c39d75db6a4365c3e488b5da497135aeee2c56269
|
|
| MD5 |
215e7a3b732eca33e15eda8d48f500a8
|
|
| BLAKE2b-256 |
61def0013ef597f6d1281e0b1b1486d933a3dadefb462d5dc339b0b1780fbf78
|
Provenance
The following attestation bundles were made for gllm_evals_binary-0.1.10.post1-cp311-cp311-win_amd64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_evals_binary-0.1.10.post1-cp311-cp311-win_amd64.whl -
Subject digest:
a1f1e856926644f018ea1d9c39d75db6a4365c3e488b5da497135aeee2c56269 - Sigstore transparency entry: 1633008649
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Branch / Tag:
refs/tags/gllm_evals-v0.1.10.post1 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file gllm_evals_binary-0.1.10.post1-cp311-cp311-macosx_13_0_arm64.whl.
File metadata
- Download URL: gllm_evals_binary-0.1.10.post1-cp311-cp311-macosx_13_0_arm64.whl
- Upload date:
- Size: 2.3 MB
- Tags: CPython 3.11, macOS 13.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b8cfd72ea953254a228accb8006ad2647c089138d10d6025a9f158acf343532
|
|
| MD5 |
fb96704d199c10714a6f9376e3a338f4
|
|
| BLAKE2b-256 |
c6f0a7431cadb08d42454a908d2a2c1660439a92de728e2a594161606fe0a354
|
Provenance
The following attestation bundles were made for gllm_evals_binary-0.1.10.post1-cp311-cp311-macosx_13_0_arm64.whl:
Publisher:
build-binary.yml on GDP-ADMIN/gl-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gllm_evals_binary-0.1.10.post1-cp311-cp311-macosx_13_0_arm64.whl -
Subject digest:
5b8cfd72ea953254a228accb8006ad2647c089138d10d6025a9f158acf343532 - Sigstore transparency entry: 1633008661
- Sigstore integration time:
-
Permalink:
GDP-ADMIN/gl-sdk@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Branch / Tag:
refs/tags/gllm_evals-v0.1.10.post1 - Owner: https://github.com/GDP-ADMIN
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-binary.yml@3c6e33e71f1181aef3ec3fa5b089c40bbc3b05c2 -
Trigger Event:
push
-
Statement type: