Skip to main content

Opticonomy Prompt Driven Model Evaluation (PDME)

Project description

Opticonomy Prompt Driven Model Evaluation (PDME)

Step 1: Installation and Environment

Install Package

pip install opticonomy-pdme

Create and Activate the Virtual Environment

  • Set up a Python virtual environment and activate it (Linux):

    python3 -m venv .venv
    source .venv/bin/activate
    
  • Set up a Python virtual environment and activate it (Windows/VS Code / Bash):

    python -m venv venv
    source venv/Scripts/activate
    
  • Install dependencies from the requirements.txt file:

    pip install -r requirements.txt
    

Sample Use Cases

Storytelling

python pdme_client.py --eval_model openai/gpt-3.5-turbo-0125 --test_model openai-community/gpt2 --seed_1 "an old Englishman" --seed_2 "finding happiness" --seed_3 "rain" --seed_4 "old cars"
python pdme_client.py --eval_model openai/gpt-3.5-turbo-0125 --test_model distilbert/distilgpt2 --seed_1 "an old Englishman" --seed_2 "finding happiness" --seed_3 "rain" --seed_4 "old cars"
python pdme_client.py --eval_model openai/gpt-4o --test_model --test_model distilbert/distilgpt2 --seed_1 "an old Englishman" --seed_2 "finding happiness" --seed_3 "rain" --seed_4 "old cars"
python pdme_client.py --eval_model openai/gpt-4o --test_model openai-community/gpt2 --seed_1 "an old Englishman" --seed_2 "finding happiness" --seed_3 "rain" --seed_4 "old cars"

Overview

The method uses a single text generation AI, referred to as eval model, to evaluate any other text generation AI on any topic, and the evaluation works like this:

  1. We write a text prompt for what questions the eval model should generate, and provide seeds that are randomly picked to generate a question.
  2. The question is sent to the AI model being tested, and it generates a response.
  3. Likewise, the eval model also generates an answer to the same question.
  4. The eval model then uses a text prompt we write, to compare the two answers and pick the winner. (This model does not necessarily have to be the same as the eval model, but it does simplify inference)

This method allows us to evaluate models for any topic, such as: storytelling, programming, finance, and QnA.

Technical Description

See above for the installation and running instructions.

Example Use Case

Let’s say you want to evaluate a model's ability to write stories, PDME should be possible to use in the following way:

  1. Bootstrap Prompt - First generate a bootstrap prompt using random seeds, e.g.

(continue....)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opticonomy-pdme-0.1.4.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opticonomy_pdme-0.1.4-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file opticonomy-pdme-0.1.4.tar.gz.

File metadata

  • Download URL: opticonomy-pdme-0.1.4.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for opticonomy-pdme-0.1.4.tar.gz
Algorithm Hash digest
SHA256 57dce22a3613732556af7bcf6c8706df23ba1331b833fc6c1b2b6f2a5ce76a37
MD5 ddcecff3113bd803024195681d771fa2
BLAKE2b-256 df9e27318095f2c8197dacccbaea51bfc0197e6340a36129edd50af05d53abd1

See more details on using hashes here.

File details

Details for the file opticonomy_pdme-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for opticonomy_pdme-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f88f023e7a871e7ba05ac73c1906102028dd1aeb96d57ab2c804702f3dbd358f
MD5 ae1f1def8f4a986559e65f7e80eda5c8
BLAKE2b-256 86acf1340be97c089b9973ee44e4699e074fe2a06d88a7dc6ca9f029a05d388d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page