Skip to main content

No project description provided

Project description

pharia-studio-sdk

Formerly the intelligence_layer/evaluation package.

Overview

The pharia-studio-sdk provides a set of tools for evaluating and benchmarking LLMs with the Studio applications. You can access the documentation on Read the Docs.

Installation

The SDK is published on PyPI.

To add the SDK as a dependency to an existing project managed, run

pip install pharia-studio-sdk

Example Usage

This example demonstrates how to define, run, and evaluate a custom task using the pharia-studio-sdk. It serves as a reference point for creating your own evaluation logic with Pharia Studio. Before running the example, make sure to set up the necessary configuration values: STUDIO_URL, INFERENCE_URL, and AA_TOKEN.

from statistics import mean
from typing import Iterable

from aleph_alpha_client import Client
from pharia_inference_sdk.core import (
    CompleteInput,
    ControlModel,
    Llama3InstructModel,
    Task,
    TaskSpan,
)
from pydantic import BaseModel

from pharia_studio_sdk import StudioClient
from pharia_studio_sdk.evaluation import (
    AggregationLogic,
    Example,
    SingleOutputEvaluationLogic,
    StudioBenchmarkRepository,
    StudioDatasetRepository,
)

# Studio Configuration
PROJECT_NAME = "My first project"
BENCHMARK_NAME = "My first benchmark"
STUDIO_URL = "<studio_url>"

# Inference Configuration
INFERENCE_URL = "<inference_url>"
AA_TOKEN = "<aa_token>"

# Define a task with the `pharia-inference-sdk` module
class ExtractIngredientsTaskInput(BaseModel):
    recipe: str

class ExtractIngredientsTaskOutput(BaseModel):
    ingredients: list[str]

class ExtractIngredientsTask(Task[ExtractIngredientsTaskInput, ExtractIngredientsTaskOutput]):
    PROMPT_TEMPLATE = """Given the following recipe, extract the list of ingredients. Write one ingredient per line, do not add the quantity, do not add any text besides the list.

Recipe:
{recipe}
"""    
    def __init__(self, model: ControlModel):
        self._model = model


    def do_run(
            self, input: ExtractIngredientsTaskInput, task_span: TaskSpan
        ) -> ExtractIngredientsTaskOutput:
            
            prompt = self._model.to_instruct_prompt(
                self.PROMPT_TEMPLATE.format(recipe=input.recipe)
            )

            completion_input = CompleteInput(
                 prompt=prompt,
                 model=self._model)
            completion = self._model.complete(completion_input, tracer=task_span)
            ingredients = completion.completions[0].completion.split('\n')
            return ExtractIngredientsTaskOutput(ingredients=ingredients)

# Define a logic for evaluation
class IngredientsEvaluation(BaseModel):
    correct_number_of_ingredients: bool

class IngredientsAggregatedEvaluation(BaseModel):
    avg_result: float

class IngredientsEvaluationLogic(
    SingleOutputEvaluationLogic[
        ExtractIngredientsTaskInput,
        ExtractIngredientsTaskOutput,
        ExtractIngredientsTaskOutput,
        IngredientsEvaluation,
    ]
):
    def do_evaluate_single_output(
        self,
        example: Example[ExtractIngredientsTaskInput, ExtractIngredientsTaskOutput],
        output: ExtractIngredientsTaskOutput,
    ) -> IngredientsEvaluation:
        return IngredientsEvaluation(
            correct_number_of_ingredients=len(output.ingredients) == len(example.expected_output.ingredients),
        )
class IngredientsAggregationLogic(
    AggregationLogic[IngredientsEvaluation, IngredientsAggregatedEvaluation]
):
    def aggregate(
        self, evaluations: Iterable[IngredientsEvaluation]
    ) -> IngredientsAggregatedEvaluation:
        evaluation_list = list(evaluations)
        return IngredientsAggregatedEvaluation(
            avg_result=mean(evaluation.correct_number_of_ingredients for evaluation in evaluation_list)
        )

aa_client = Client(token=AA_TOKEN, host=INFERENCE_URL)
studio_client = StudioClient(PROJECT_NAME,studio_url=STUDIO_URL, auth_token=AA_TOKEN, create_project=True)
studio_benchmark_repository = StudioBenchmarkRepository(studio_client=studio_client)
studio_dataset_repository = StudioDatasetRepository(studio_client=studio_client)

evaluation_logic = IngredientsEvaluationLogic()
aggregation_logic = IngredientsAggregationLogic()

# Create a dataset with example inputs and expected outputs
examples = [   Example(
        input=ExtractIngredientsTaskInput(
            recipe="""# Pike Burger

- Pike (the bigger the better)
- Breadcrumbs
- Onions
- Garlic 
- Eggs
- Mustard
- Flour
- Spices
- Salt
- Pepper
- Oil

1. Fish pike
2. Filet fish into pieces
3. Grind the fish and onions into a paste
4. Mix paste with all other ingredients beside the flour and oil, add breadcrumbs until the consistency is right for forming patties
5. Form the paste into patties and coat them in flour
6. Let them rest in the fridge for 30min to firm up
7. Shallow fry them in a pan with a lot of oil
"""
        ),
        expected_output=ExtractIngredientsTaskOutput(
            ingredients=[
               "Pike", "Breadcrumbs", "Onions", "Garlic", "Eggs", "Mustard", "Flour", "Spices", "Salt", "Pepper", "Oil"
            ]
        ),
    )]
dataset = studio_dataset_repository.create_dataset(
    examples=examples,
    dataset_name="My first dataset",
    metadata={"description": "dataset_description"},
)

model = Llama3InstructModel(name="llama-3.1-8b-instruct", client=aa_client)
task = ExtractIngredientsTask(model=model)
# Create and run a benchmark
benchmark = studio_benchmark_repository.create_benchmark(
    dataset_id=dataset.id,
    eval_logic=evaluation_logic,
    aggregation_logic=aggregation_logic,
    name=BENCHMARK_NAME,
    metadata={"key": "value"},
)
benchmark.execute(
    task=task,
    name=BENCHMARK_NAME,
)

# Go see the benchmark result in PhariaStudio

Contributing

We welcome contributions! Please see our Contributing Guide for details on how to set up the development environment and submit changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pharia_studio_sdk-0.1.7.tar.gz (64.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pharia_studio_sdk-0.1.7-py3-none-any.whl (89.6 kB view details)

Uploaded Python 3

File details

Details for the file pharia_studio_sdk-0.1.7.tar.gz.

File metadata

  • Download URL: pharia_studio_sdk-0.1.7.tar.gz
  • Upload date:
  • Size: 64.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.17

File hashes

Hashes for pharia_studio_sdk-0.1.7.tar.gz
Algorithm Hash digest
SHA256 dde028b2345b11b669eaa8ba1b186b59b828342510306769e8904c5bf8c62325
MD5 f159bd929c01463a553885d901ef1a4f
BLAKE2b-256 96139d2a7958b86b122537feb7b44246adfc77631d8e58669813827b26a6f5f8

See more details on using hashes here.

File details

Details for the file pharia_studio_sdk-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for pharia_studio_sdk-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 76cb19bf663c9c9f477de26e4d18eb4ec604860d3da6c9fe9c55eb6ff7fe4be1
MD5 93eb1255053e42f9adc1230a279d72ad
BLAKE2b-256 b728fa4b7d27856678704c232a095a7d5da0d718465895bf810c0d4a44ebfd87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page