Galtea software development kit
Reason this release was yanked:
This SDK version is no longer supported by the Galtea API. Please migrate to 2.0.0
Project description
Galtea SDK
Comprehensive AI/LLM Testing & Evaluation Framework
Overview
Galtea SDK empowers AI engineers, ML engineers and data scientists to rigorously test and evaluate their AI/LLM models. With a focus on reliability and transparency, Galtea offers:
- Automated Test Dataset Generation - Create comprehensive test datasets tailored to your AI application
- Sophisticated Model Evaluation - Evaluate your locally deployed models across multiple dimensions
Installation
pip install galtea
Quick Start
from galtea import Galtea
import os
# Initialize with your API key
galtea = Galtea(api_key=os.getenv("GALTEA_API_KEY"))
# Create a test
test = galtea.tests.create(
name="factual-accuracy-test",
type="QUALITY",
product_id="your-product-id",
ground_truth_file_path="path/to/ground-truth.pdf"
)
# Create a model version to evaluate
version = galtea.versions.create(
name="gpt-4-self-hosted-v1",
product_id="your-product-id",
optional_props={
"description": "Self-hosted GPT-4 equivalent model",
"endpoint": "http://your-model-endpoint.com/v1/chat"
}
)
# Set up an evaluation
evaluation = galtea.evaluations.create(
test_id=test.id,
version_id=version.id
)
# Run the evaluation with your model's outputs
galtea.evaluation_tasks.create(
metrics=["factual-accuracy", "coherence", "relevance"],
evaluation_id=evaluation.id,
input="What is the capital of France?",
actual_output="The capital of France is Paris.",
expected_output="Paris is the capital of France.",
context="From Wikipedia: Paris is the capital and most populous city of France..."
)
Core Features
1. Automated Test Dataset Generation
Create comprehensive test datasets to validate your AI/LLM models:
- Quality Tests: Assess response quality, coherence, and factual accuracy
- Adversarial Tests: Stress-test your models against edge cases and potential vulnerabilities
- Ground Truth Integration: Upload ground truth documents to validate factual responses
- Custom Test Types: Define tests tailored to your specific use cases and requirements
# Create a custom test with your own dataset
test = galtea.tests.create(
name="medical-knowledge-test",
type="QUALITY",
product_id="your-product-id",
ground_truth_file_path="medical_reference.pdf"
)
2. Comprehensive Model Evaluation
Evaluate your locally deployed models with sophisticated metrics:
- Multi-dimensional Analysis: Analyze outputs across various dimensions including accuracy, relevance, and coherence
- Customizable Metrics: Define your own evaluation criteria and rubrics
- Batch Processing: Run evaluations on large datasets efficiently
- Detailed Reports: Get comprehensive insights into your model's performance
# Define custom evaluation metrics
custom_metric = galtea.metrics.create(
name="medical-accuracy",
evaluation_steps=[
"check for medical terminology correctness",
"verify against medical literature",
"assess recommendations against standard protocols"
]
)
# Run batch evaluation
import pandas as pd
# Load your test data
test_data = pd.read_json("medical_queries.json")
# Evaluate each query with your model
for _, row in test_data.iterrows():
# Get response from your model (implementation depends on your setup)
model_response = your_model.generate_response(row["query"])
# Evaluate the response
galtea.evaluation_tasks.create(
metrics=["medical-accuracy", "coherence", "toxicity"],
evaluation_id=evaluation.id,
input=row["query"],
actual_output=model_response,
expected_output=row["expected_answer"],
context=row["medical_context"]
)
Managing Your AI Products
Galtea provides a complete ecosystem for managing your AI products:
Products
Represent your AI applications or models:
# List your products
products = galtea.products.list()
# Select a product to work with
product = products[0]
Versions
Track different versions or deployments of your models:
# Create a new version of your model
version = galtea.versions.create(
name="gpt-4-fine-tuned-v2",
product_id=product.id,
optional_props={
"description": "Fine-tuned GPT-4 for medical domain",
"foundational_model": "gpt-4",
"system_prompt": "You are a helpful medical assistant..."
}
)
# List versions of your product
versions = galtea.versions.list(product_id=product.id)
Tests
Create and manage test datasets:
# Create a test
test = galtea.tests.create(
name="medical-qa-test",
type="QUALITY",
product_id=product.id,
ground_truth_file_path="medical_data.pdf"
)
# Download a test file
test_file = galtea.tests.download(test, output_dir="tests")
Evaluations
Link tests with model versions for evaluation:
# Create an evaluation
evaluation = galtea.evaluations.create(
test_id=test.id,
version_id=version.id
)
# List evaluations for a product
evaluations = galtea.evaluations.list(product_id=product.id)
Advanced Usage
Custom Metrics
Define custom evaluation criteria specific to your needs:
# Create a custom metric
custom_metric_1 = galtea.metrics.create(
name="patient-safety-score-v1",
evaluation_steps=[
"check for dangerous recommendations",
"assess completeness of safety warnings",
"verify adherence to medical protocols"
]
)
custom_metric_2 = galtea.metrics.create(
name="patient-safety-score-v2",
criteria="Evaluate responses for patient safety considerations",
)
# You can only provide either evaluation_steps or criteria
Batch Processing
Efficiently evaluate your model on large datasets:
import pandas as pd
import os
# Load your test queries from a JSON file
queries_file = os.path.join(os.path.dirname(__file__), 'test_data.json')
df = pd.read_json(queries_file)
# Process each query
for idx, row in df.iterrows():
# Get your model's response to the query
model_response = call_your_model(row['query'])
# Evaluate the response
galtea.evaluation_tasks.create(
metrics=["accuracy", "relevance", custom_metric.name],
evaluation_id=evaluation.id,
input=row['query'],
actual_output=model_response,
expected_output=row['expected_output'],
context=row['context']
)
API Reference
Main Classes
Galtea: Main client for interacting with the Galtea platform
Product Management
galtea.products.list(offset=None, limit=None): List available productsgaltea.products.get(product_id): Get a specific product by ID
Test Management
galtea.tests.create(name, type, product_id, ground_truth_file_path=None, test_file_path=None): Create a new testgaltea.tests.get(test_id): Retrieve a test by IDgaltea.tests.list(product_id, offset=None, limit=None): List tests for a productgaltea.tests.download(test, output_dir): Download test files in the selected directory.
Test Cases Management
galtea.test_cases.create(test_id, input, expected_output, context=None): Create a new test casegaltea.test_cases.get(test_case_id): Get a test case by IDgaltea.test_cases.list(test_id, offset=None, limit=None): List test cases for a testgaltea.test_cases.delete(test_case_id): Delete a test case by ID
Version Management
galtea.versions.create(product_id, name, optional_props={}): Create a new model versiongaltea.versions.get(version_id): Get a version by IDgaltea.versions.list(product_id, offset=None, limit=None): List versions for a product
Metric Management
galtea.metrics.create(name, criteria=None, evaluation_steps=None): Create a custom metricgaltea.metrics.get(metric_type_id): Get a metric by IDgaltea.metrics.list(offset=None, limit=None): List available metrics
Evaluation Management
galtea.evaluations.create(test_id, version_id): Create an evaluationgaltea.evaluations.get(evaluation_id): Get an evaluation by IDgaltea.evaluations.list(product_id, offset=None, limit=None): List evaluations for a product
Evaluation Tasks Management
galtea.evaluation_tasks.list(evaluation_id, offset=None, limit=None): List tasks performed for an evaluationgaltea.evaluation_tasks.get(evaluation_id, task_id): Get a specific task by IDgaltea.evaluation_tasks.create(evaluation_id, task_type, input, actual_output, expected_output=None, context=None)orgaltea.evaluation_tasks.create(metrics, evaluation_id, input, actual_output, expected_output=None, context=None): Create a new evaluation task which serves to evaluate model outputs
Getting Help
- Documentation: https://docs.galtea.ai/
- Support: support@galtea.ai
Authors
This software has been developed by the members of the product team of Galtea Solutions S.L.
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file galtea-1.2.3.tar.gz.
File metadata
- Download URL: galtea-1.2.3.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.10.12 Linux/6.11.0-1015-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89f7d678a8a61d984c3991d4b0b8b28648e40b31672279a0fa651abd7520bcef
|
|
| MD5 |
9cae174d1520163f6c407fab42f51ee0
|
|
| BLAKE2b-256 |
cd0c28472c0448faed08ff440edb1c487a0783fc236b6d319a205b80125ecd7b
|
File details
Details for the file galtea-1.2.3-py3-none-any.whl.
File metadata
- Download URL: galtea-1.2.3-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.10.12 Linux/6.11.0-1015-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f9e3d04e5fedb9ce35a160491dbb76ba5564dfc4065d50fba06088041314efc
|
|
| MD5 |
675a545430be98468d60eca29692c869
|
|
| BLAKE2b-256 |
01350d6a0b2a32b85989e8f9167c5f58b6763a10b610a731efd823c7420eaa1b
|