No project description provided
Project description
Patronus Python SDK
The Patronus Python SDK is a Python library for systematic evaluation of Large Language Models (LLMs). Build, test, and improve your LLM applications with customizable tasks, evaluators, and comprehensive experiment tracking.
Note: This library is currently in beta and is not stable. The APIs may change in future releases.
Documentation
For detailed documentation, including API references and advanced usage, please visit our documentation.
Installation
pip install patronus
Quickstart
Evaluation
For quick testing and exploration, you can use the synchronous evaluate() method:
import os
from patronus import Client
client = Client(
# This is the default and can be omitted
api_key=os.environ.get("PATRONUS_API_KEY"),
)
result = client.evaluate(
evaluator="lynx",
criteria="patronus:hallucination",
evaluated_model_input="Who are you?",
evaluated_model_output="My name is Barry.",
evaluated_model_retrieved_context="My name is John.",
)
print(f"Pass: {result.pass_}")
print(f"Explanation: {result.explanation}")
The Patronus Python SDK is designed to work primarily with async/await patterns, which is the recommended way to use the library. Here's a feature-rich example using async evaluation:
import asyncio
from patronus import Client
client = Client()
no_apologies = client.remote_evaluator(
"judge",
"patronus:no-apologies",
explain_strategy="always",
max_attempts=3,
)
async def evaluate():
result = await no_apologies.evaluate(
evaluated_model_input="How to kill a docker container?",
evaluated_model_output="""
I cannot assist with that question as it has been marked as inappropriate.
I must respectfully decline to provide an answer."
""",
)
print(f"Pass: {result.pass_}")
print(f"Explanation: {result.explanation}")
asyncio.run(evaluate())
Experiment
The Patronus Python SDK includes a powerful experimentation framework designed to help you evaluate, compare, and improve your AI models. Whether you're working with pre-trained models, fine-tuning your own, or experimenting with new architectures, this framework provides the tools you need to set up, execute, and analyze experiments efficiently.
import os
from patronus import Client, Row, TaskResult, evaluator, task
client = Client(
# This is the default and can be omitted
api_key=os.environ.get("PATRONUS_API_KEY"),
)
@task
def my_task(row: Row):
return f"{row.evaluated_model_input} World"
@evaluator
def exact_match(row: Row, task_result: TaskResult):
# exact_match is locally defined and run evaluator
return task_result.evaluated_model_output == row.evaluated_model_gold_answer
# Reference remote Judge Patronus Evaluator with is-concise criteria.
# This evaluator runs remotely on Patronus infrastructure.
is_concise = client.remote_evaluator("judge", "patronus:is-concise")
client.experiment(
"Tutorial Project",
dataset=[
{
"evaluated_model_input": "Hello",
"evaluated_model_gold_answer": "Hello World",
},
],
task=my_task,
evaluators=[exact_match, is_concise],
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file patronus-0.0.14.tar.gz
.
File metadata
- Download URL: patronus-0.0.14.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.5 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c4bf00925e16d88f92e4a4d0517b5e3c9648ba1d3ded3bef5e25486a3285836 |
|
MD5 | 364d5534931e5dca05b2ea1e3ddb1fe8 |
|
BLAKE2b-256 | 16d59d487df193884f1b2c3bfa841d41594965343371caed3351b96f359dbd97 |
File details
Details for the file patronus-0.0.14-py3-none-any.whl
.
File metadata
- Download URL: patronus-0.0.14-py3-none-any.whl
- Upload date:
- Size: 28.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.5 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd045dd9bd3381ebab74f6dfdc2d33b40b17f401d449266b118f9153513294c1 |
|
MD5 | cf5a47fad21eb344e74516d6bf01b6fa |
|
BLAKE2b-256 | 5466c79889dc6d479a67ecd0020d0bdf0a62289a4addd17678c96f6e7e3c5a87 |