Skip to main content

A tool for generating and managing prompts for local LLMs using Ollama

Project description

Promptwright - Synthetic Dataset Generation Library

Tests Python Version

promptwright-cover

Promptwright is a Python library from Stacklok designed for generating large synthetic datasets using a local LLM. The library offers a flexible and easy-to-use set of interfaces, enabling users the ability to generate prompt led synthetic datasets.

Promptwright was inspired by the redotvideo/pluto, in fact it started as fork, but ended up largley being a re-write, to allow dataset generation against a local LLM model.

The library interfaces with Ollama, making it easy to just pull a model and run Promptwright.

Features

  • Local LLM Client Integration: Interact with Ollama based models
  • Configurable Instructions and Prompts: Define custom instructions and system prompts
  • Push to Hugging Face: Push the generated dataset to Hugging Face Hub.

Getting Started

Prerequisites

Installation

To install the prerequisites, you can use the following commands:

pip install promptwright
ollama serve
ollama pull {model_name} # whichever model you want to use

Example Usage

There are a few examples in the examples directory that demonstrate how to use the library to generate different topic based datasets.

Running an Example

To run an example:

  1. Ensure you have started Ollama by running ollama serve.
  2. Verify that the required model is downloaded (e.g. llama3.2:latest).
  3. Set the model_name in the chosen example file to the model you have downloaded.
    tree = TopicTree(
      args=TopicTreeArguments(
          root_prompt="Creative Writing Prompts",
          model_system_prompt=system_prompt,
          tree_degree=5, # Increase degree for more prompts
          tree_depth=4, # Increase depth for more prompts
          temperature=0.9, # Higher temperature for more creative variations
          model_name="ollama/llama3" # Set the model name here
      )
    )
    engine = DataEngine(
      args=EngineArguments(
          instructions="Generate creative writing prompts and example responses.",
          system_prompt="You are a creative writing instructor providing writing prompts and example responses.",
          model_name="ollama/llama3",
          temperature=0.9,
          max_retries=2,
  1. Run your chosen example file:
    python example/creative_writing.py
    
  2. The generated dataset will be saved to a JSONL file to whatever is set within dataset.save().

Prompt Output Examples

{
  "messages": [
    {
      "role": "system",
      "content": "You are tasked with designing an immersive virtual reality experience that transports users to a fantastical world of wonder."
    },
    {
      "role": "user",
      "content": "Create a descriptive passage about a character discovering their hidden talents."
    },
    {
      "role": "assistant",
      "content": "As she stared at the canvas, Emma's fingers hovered above the paintbrushes, as if hesitant to unleash the colors that had been locked within her. The strokes began with bold abandon, swirling blues and greens merging into a mesmerizing dance of light and shadow. With each passing moment, she felt herself becoming the art – her very essence seeping onto the canvas like watercolors in a spring storm. The world around her melted away, leaving only the vibrant symphony of color and creation."
    }
  ]
}

Model Compatibility

The library should work with most LLM models. It has been tested with the following models so far:

  • Mistral
  • LLaMA3 --Qwen2.5

Unpredictable Behavior

The library is designed to generate synthetic data based on the prompts and instructions provided. The quality of the generated data is dependent on the quality of the prompts and the model used. The library does not guarantee the quality of the generated data.

Large Language Models can sometimes generate unpredictable or inappropriate content and the authors of this library are not responsible for the content generated by the models. We recommend reviewing the generated data before using it in any production environment.

Large Language Models also have the potential to fail to stick with the behavior defined by the prompt around JSON formatting, and may generate invalid JSON. This is a known issue with the underlying model and not the library. We handle these errors by retrying the generation process and filtering out invalid JSON. The failure rate is low, but it can happen. We report on each failure within a final summary.

Contributing

If something here could be improved, please open an issue or submit a pull request.

License

This project is licensed under the Apache 2 License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptwright-1.0.0.tar.gz (5.7 MB view details)

Uploaded Source

Built Distribution

promptwright-1.0.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file promptwright-1.0.0.tar.gz.

File metadata

  • Download URL: promptwright-1.0.0.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for promptwright-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6f0c47b8c45b9bf9ed47b4f204e198cb833f789942330b06b18a9567b8f66128
MD5 1707e39a17cbd7f6c88fb0b8584f91fb
BLAKE2b-256 e7d2989f4a243579220f2267023712b800e0f638060c2296b1f784239d9cc747

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptwright-1.0.0.tar.gz:

Publisher: publish.yml on StacklokLabs/promptwright

Attestations:

File details

Details for the file promptwright-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: promptwright-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for promptwright-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b141bd8fe84b41fe0841c8a6f97071156e0bcc07d9570755140ac183399af71d
MD5 73818a6a57c0eb8db46d2ec9d9bd487c
BLAKE2b-256 3222a677e2ef06b567bbf9e2b8b84935defceb2c7d87df64fc5e1767d477c633

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptwright-1.0.0-py3-none-any.whl:

Publisher: publish.yml on StacklokLabs/promptwright

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page