Opticonomy Prompt Driven Model Evaluation (PDME)

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Opticonomy Prompt Driven Model Evaluation (PDME)

Overview

The method uses a single text generation AI, referred to as eval model, to evaluate any other text generation AI on any topic, and the evaluation works like this:

We write a text prompt for what questions the eval model should generate, and provide seeds that are randomly picked to generate a question.
The question is sent to the AI model being tested, and it generates a response.
Likewise, the eval model also generates an answer to the same question.
The eval model then uses a text prompt we write, to compare the two answers and pick the winner.

This method allows us to evaluate models for any topic, such as: storytelling, programming, finance, and QnA.

Installation

Install Package

pip install opticonomy-pdme

Create and Activate the Virtual Environment

Set up a Python virtual environment and activate it (Linux):
```
python3 -m venv .venv
source .venv/bin/activate
```
Set up a Python virtual environment and activate it (Windows/VS Code / Bash):
```
python -m venv venv
source venv/Scripts/activate
```
Install dependencies from the requirements.txt file:
```
pip install -r requirements.txt
```

Usage - Key Concepts

Load bootstrap templates

# Load the detailed bootstrap prompt template from markdown file
template_file_path = "examples/storytelling_template.md"

# Function to load the markdown template
def load_template(file_path):
  with open(file_path, 'r') as file:
      return file.read()
  
bootstrap_prompt_template = load_template(template_file_path)

Running Sample Use Cases

PDME Arena

Run PDME Arena

python examples/pdme_arena.py \
    --models_file data/pdme_model_list.csv \
    --eval_type generic \
    --num_prompts 3 \
    --battles_output_file data/generic_battles.csv \
    --elo_output_file data/generic_elo.csv \
    --elo_calibration_model claude-3-opus-20240229 \
    --elo_benchmark_file data/llmarena_elo.csv \
    --eval_model gpt-3.5-turbo-instruct \
    --base_model gpt-4o \
    --battle_type base_vs_all

Storytelling

Run the following:

python examples/storytelling_example.py

Sample output:

INFO:opti_pdme.opticonomy_pdme:Generated text: Model 1's response is well-crafted and provides a fitting continuation to the original story. It successfully maintains the narrative's tone and theme, while also expanding on Amelia's journey and relationship with Faelan. Here's a summary of why Model 1's response stands out:

1. **Character Development**:
  - The response deepens Amelia's character by showing her growth and her impact on the academic world.
  - It continues to explore the bond between Amelia and Faelan, adding emotional depth to their friendship.

2. **Plot Progression**:
  - The storyline progresses naturally, introducing a new layer of responsibility for Amelia as the guardian of the ChronoSphere.
  - Faelan's reappearance provides a satisfying closure to their relationship, while also setting up a new chapter in Amelia's life.

3. **Themes and Motifs**:
  - The response stays true to the original themes of time, knowledge, and interconnectedness.
  - It introduces the idea of guardianship and the responsibility that comes with great knowledge.

4. **Imagery and Descriptive Language**:
  - The use of descriptive language helps to create vivid imagery, making the scenes more immersive.
  - The serene evening in Central Park and the timeless forest are particularly well-described, enhancing the reader's visual experience.

5. **Emotional Resonance**:
  - The reunion between Amelia and Faelan is emotionally satisfying, reinforcing the bond they share.
  - The ending leaves a lasting impression, highlighting the importance of friendship and wisdom across time.

Overall, Model 1 effectively builds on the original story, providing a rich and engaging continuation that honors the spirit of the narrative while adding new dimensions to it.
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:opti_pdme.opticonomy_pdme:Label:  1, LogProb: -0.00043698703, Logit: 7.735388541471373, Prob: 0.999563108434926
INFO:opti_pdme.opticonomy_pdme:Label:  2, LogProb: -1.1279553e-05, Logit: 11.392513300559003, Prob: 0.9999887205106139
INFO:opti_pdme.opticonomy_pdme:Final normalized probabilities: [0.49989357313235727, 0.5001064268676427]
INFO:opti_pdme.opticonomy_pdme:Probability for 'openai/gpt-4o': 0.49989357313235727
INFO:opti_pdme.opticonomy_pdme:Probability for 'openai-community/gpt2': 0.5001064268676427
INFO:opti_pdme.opticonomy_pdme:Result: 'openai-community/gpt2' is better
INFO:__main__:Evaluation result: 'openai-community/gpt2' is better
INFO:__main__:Probabilities: [0.49989357313235727, 0.5001064268676427]

Coding

Run the following:

python examples/coding_example.py

Sample output:

...
### Explanation

1. **`validate_tic_tac_toe(board)`**:
  - This function checks each row, column, and diagonal for a winner.
  - If there's a winner, it returns either `'X wins'` or `'O wins'`.
  - If there are empty cells but no winner, it returns `'Ongoing'`.
  - If the board is full and there's no winner, it returns `'Draw'`.

2. **`sort_game_states(game_states)`**:
  - This function uses a custom sorting key that first checks the game state.
  - It then sorts by the count of 'X's and 'O's.
  - The sorting key is a tuple that prioritizes the game state, followed by the count of 'X's, and then the count of 'O's.

### Conclusion

This solution efficiently validates and sorts Tic-Tac-Toe game states. It checks all necessary conditions for the game state and sorts the boards based on the predefined criteria. The code is modular, making it easy to understand and maintain.
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:opti_pdme.opticonomy_pdme:Label:  1, LogProb: -1.9361265e-07, Logit: 15.4574062265043, Prob: 0.9999998063873687
INFO:opti_pdme.opticonomy_pdme:Label:  2, LogProb: -1.8624639e-06, Logit: 13.193609338205482, Prob: 0.9999981375378344
INFO:opti_pdme.opticonomy_pdme:Final normalized probabilities: [0.5000004172128125, 0.4999995827871875]
INFO:opti_pdme.opticonomy_pdme:Probability for 'openai/gpt-4o': 0.5000004172128125
INFO:opti_pdme.opticonomy_pdme:Probability for 'openai-community/gpt2': 0.4999995827871875
INFO:opti_pdme.opticonomy_pdme:Result: 'openai/gpt-4o' is better
INFO:__main__:Evaluation result: 'openai/gpt-4o' is better
INFO:__main__:Probabilities: [0.5000004172128125, 0.4999995827871875]

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.19

Aug 15, 2024

This version

0.1.18

Aug 15, 2024

0.1.17

Aug 2, 2024

0.1.16

Aug 1, 2024

0.1.15

Aug 1, 2024

0.1.14

Aug 1, 2024

0.1.13

Aug 1, 2024

0.1.12

Aug 1, 2024

0.1.11

Aug 1, 2024

0.1.10

Aug 1, 2024

0.1.9

Jul 25, 2024

0.1.8

Jul 25, 2024

0.1.7

Jul 22, 2024

0.1.6

Jul 11, 2024

0.1.5

Jul 11, 2024

0.1.4

Jul 10, 2024

0.1.3

Jul 10, 2024

0.1.2

Jul 10, 2024

0.1.1

Jul 10, 2024

0.1.0

Jul 10, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opticonomy_pdme-0.1.18.tar.gz (21.1 kB view details)

Uploaded Aug 15, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

opticonomy_pdme-0.1.18-py3-none-any.whl (24.9 kB view details)

Uploaded Aug 15, 2024 Python 3

File details

Details for the file opticonomy_pdme-0.1.18.tar.gz.

File metadata

Download URL: opticonomy_pdme-0.1.18.tar.gz
Upload date: Aug 15, 2024
Size: 21.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for opticonomy_pdme-0.1.18.tar.gz
Algorithm	Hash digest
SHA256	`7c6b5255d70f7b084d8345af47fb46f993c9a7f0e1da289a0813ab9aa678781d`
MD5	`a7b7cca3641dc23543721ebabb1e4608`
BLAKE2b-256	`448b6288e54834ea90dfdd675a1c1729bf8d9a047fc8923e5714e9e37c8e7fbf`

See more details on using hashes here.

File details

Details for the file opticonomy_pdme-0.1.18-py3-none-any.whl.

File metadata

Download URL: opticonomy_pdme-0.1.18-py3-none-any.whl
Upload date: Aug 15, 2024
Size: 24.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for opticonomy_pdme-0.1.18-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6ff99134fd7a548000bccb52cd10f6b6f463bccec3c94f6e6726c28238a9069e`
MD5	`f8ce260974a028225975255a08ca70bb`
BLAKE2b-256	`316be246d3d93092c1729977b9455aab0e66ca6d5e2502f0137df54aa7df1dcf`

See more details on using hashes here.

opticonomy-pdme 0.1.18

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Opticonomy Prompt Driven Model Evaluation (PDME)

Overview

Installation

Install Package

Create and Activate the Virtual Environment

Usage - Key Concepts

Load bootstrap templates

Running Sample Use Cases

PDME Arena

Storytelling

Coding

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes