A framework for optimizing prompts through multi-task evaluation and iterative improvement

Project description

Promptim

Experimental prompt optimization library.

Example:

Clone the repo, then setup:

uv venv
source .venv/bin/activate
uv pip install -e .
python examples/tweet_writer/create_dataset.py

Then run prompt optimization.

promptim --task examples/tweet_writer/config.json --version 1

Create a custom task

Currently, promptim runs over individual tasks. A task defines the dataset (with train/dev/test splits), initial prompt, evaluators, and other information needed to optimize your prompt.

    name: str  # The name of the task
    description: str = ""  # A description of the task (optional)
    evaluator_descriptions: dict = field(default_factory=dict)  # Descriptions of the evaluation metrics
    dataset: str  # The name of the dataset to use for the task
    initial_prompt: PromptConfig  # The initial prompt configuration.
    evaluators: list[Callable[[Run, Example], dict]]  # List of evaluation functions
    system: Optional[SystemType] = None  # Optional custom function with signature (current_prompt: ChatPromptTemplate, inputs: dict) -> outputs

Let's walk through the example "tweet writer" task to see what's expected. First, view the config.json file

{
  "optimizer": {
    "model": {
      "model": "claude-3-5-sonnet-20241022",
      "max_tokens_to_sample": 8192
    }
  },
  "task": "examples/tweet_writer/task.py:tweet_task"
}

The first part contains confgiuration for the optimizer process. For now, this is a simple configuration for the default (and only) metaprmopt optimizer. You can control which LLM is used via the model configuration.

The second part is the path to the task file itself. We will review this below.

def multiple_lines(run, example):
    """Evaluate if the tweet contains multiple lines."""
    result = run.outputs.get("tweet", "")
    score = int("\n" in result)
    comment = "Pass" if score == 1 else "Fail"
    return {
        "key": "multiline",
        "score": score,
        "comment": comment,
    }


tweet_task = dict(
    name="Tweet Generator",
    dataset="tweet-optim",
    initial_prompt={
        "identifier": "tweet-generator-example:c39837bd",
    },
    # See the starting prompt here:
    # https://smith.langchain.com/hub/langchain-ai/tweet-generator-example/c39837bd
    evaluators=[multiple_lines],
    evaluator_descriptions={
        "under_180_chars": "Checks if the tweet is under 180 characters. 1 if true, 0 if false.",
        "no_hashtags": "Checks if the tweet contains no hashtags. 1 if true, 0 if false.",
        "multiline": "Fails if the tweet is not multiple lines. 1 if true, 0 if false. 0 is bad.",
    },
)

We've defined a simple evaluator to check that the output spans multiple lines.

We have also selected an initial prompt to optimize. You can check this out in the hub.

By modifying the above values, you can configure your own task.

CLI Arguments

The CLI is experimental.

Usage: promptim [OPTIONS]

  Optimize prompts for different tasks.

Options:
  --version [1]                [required]
  --task TEXT                  Task to optimize. You can pick one off the
                               shelf or select a path to a config file.
                               Example: 'examples/tweet_writer/config.json
  --batch-size INTEGER         Batch size for optimization
  --train-size INTEGER         Training size for optimization
  --epochs INTEGER             Number of epochs for optimization
  --debug                      Enable debug mode
  --use-annotation-queue TEXT  The name of the annotation queue to use. Note:
                               we will delete the queue whenever you resume
                               training (on every batch).
  --no-commit                  Do not commit the optimized prompt to the hub
  --help                       Show this message and exit.

We have created a few off-the-shelf tasks:

tweet: write tweets
simpleqa: really hard Q&A
scone: NLI

run

Project details

Release history Release notifications | RSS feed

0.0.6

Nov 18, 2024

0.0.5

Nov 13, 2024

0.0.5rc2 pre-release

Nov 13, 2024

0.0.5rc1 pre-release

Nov 13, 2024

0.0.4

Nov 8, 2024

0.0.4rc0 pre-release

Nov 8, 2024

0.0.3

Nov 7, 2024

0.0.3rc1 pre-release

Nov 7, 2024

0.0.2

Nov 2, 2024

0.0.2rc5 pre-release

Nov 2, 2024

0.0.2rc4 pre-release

Nov 2, 2024

0.0.2rc3 pre-release

Nov 2, 2024

0.0.2rc2 pre-release

Nov 2, 2024

0.0.2rc1 pre-release

Nov 2, 2024

0.0.2rc0 pre-release

Nov 2, 2024

This version

0.0.1

Nov 2, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptim-0.0.1.tar.gz (13.7 kB view details)

Uploaded Nov 2, 2024 Source

Built Distribution

promptim-0.0.1-py3-none-any.whl (13.3 kB view details)

Uploaded Nov 2, 2024 Python 3

File details

Details for the file promptim-0.0.1.tar.gz.

File metadata

Download URL: promptim-0.0.1.tar.gz
Upload date: Nov 2, 2024
Size: 13.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.29

File hashes

Hashes for promptim-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`02db97304de4c6cf7a41bdd5d78da3ee3c386e1da3630f39dd05e83f8e876cec`
MD5	`5dc580ad4c6c9b1e1aecd6fa26b7bbda`
BLAKE2b-256	`e6d32120a5801cbfc2054faa3e45a37c456b2f8c40b9daf523a341334a3adaec`

See more details on using hashes here.

File details

Details for the file promptim-0.0.1-py3-none-any.whl.

File metadata

Download URL: promptim-0.0.1-py3-none-any.whl
Upload date: Nov 2, 2024
Size: 13.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.29

File hashes

Hashes for promptim-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6dfb632aa2555751fa187d246ca1c3f1de9e8162c2e64ffd743bea770dd9430c`
MD5	`f51b8bd9986131265ead1148afc7cecb`
BLAKE2b-256	`fc35847081387ec3f1e42e5cc9444b7892df0fb6e82b42f6f6b1606a8f69ac9e`