Skip to main content

Package for generating AI prompts and answers

Project description

Prompt and Answer Generation Tool

This tool generates structured conversations (prompts and answers) based on specified topics using language models.

Requirements

  • Requires LiteLLM for API access

Features

  • Generate multiple prompts per topic
  • Generate assistant answers for each prompt
  • Save conversations to JSON file
  • Environment variables for configuration
  • Interactive prompts for missing values
  • Batch processing for prompt/answer generation
  • Asynchronous API calls for improved performance

Setup

  1. Clone repository:
git clone <repository-url>
cd <repository-name>
  1. Create virtual environment (optional but recommended):
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
# or 
.venv\Scripts\activate     # Windows
  1. Install dependencies:
pip install -r requirements.txt

Configuration

  1. Create .env file from example:
cp .env.example .env
  1. Configure .env file:
# Prompt/Answer Generation (.env)
PROMPTGEN_MODEL=gpt-3.5-turbo
ANSWERGEN_MODEL=gpt-4-turbo
TOPICS=Python,JavaScript,AI
AMOUNTS=5,3,2
MODEL_SPLIT=70,30
TEMPERATURE=0.7
MULTI_PROMPT=y
LOGITS=y
OUTPUT_FILE=conversations.json
BATCH_SIZE=5
ASYNC_GEN=y

# Distillation (distill.py)
MODEL_NAME=meta-llama/Llama-2-7b-hf
DATA_FILE=conversations.json
BATCH_SIZE=4
GRAD_ACC_STEPS=4
LEARNING_RATE=2e-5
ALPHA=0.7
TEMPERATURE=2.0

# QLoRA (qlora.py)
MODEL_NAME=unsloth/Qwen2.5-Coder-1.5B-Instruct
MAX_SEQ_LENGTH=2048
LOAD_IN_4BIT=True
LORA_R=16
TARGET_MODULES=q_proj,k_proj,v_proj
  1. Set API keys (in shell or .env):
export OPENAI_API_KEY=sk-xxx       # For OpenAI
export ANTHROPIC_API_KEY=sk-xxx    # For Claude
# See https://litellm.vercel.app/docs/providers for other providers

Usage Commands

  1. Generate prompts/answers:
python -m mypromptgen.main
  1. Perform migration (convert conversation file formats):
python migrate.py input.json
# Creates input.json.bak and updates input.json
  1. Run distillation training:
python distill.py
# Outputs: distill_output/
  1. Run QLoRA training:
python qlora.py
# Outputs: outputs/
# Saves merged model: unsloth_final_model/
  1. Build and publish to PyPI:
# Clean previous builds
rm -rf build dist *.egg-info

# Install build tools
pip install --upgrade setuptools build twine

# Create distribution
python -m build

# Upload to PyPI
twine upload dist/*

Note: You'll need a PyPI account and .pypirc file configured with your credentials

Environment Variable Reference

Variable Scope Description
PROMPTGEN_MODEL main Prompt gen model (e.g., gpt-3.5-turbo)
ANSWERGEN_MODEL main Comma-separated answer gen models
MODEL_SPLIT main Percentage split for answer models
TOPICS main Comma-separated topics
AMOUNTS main Prompt counts per topic
TEMPERATURE main Creativity level (0.0-1.0)
MULTI_PROMPT main Multi-prompt generation (y/n)
LOGITS main Capture log probabilities (y/n)
OUTPUT_FILE main JSON output filename
BATCH_SIZE main,distill Generation batch size
ASYNC_GEN main Parallel API calls (y/n)
MODEL_NAME distill,qlora Base model for training
DATA_FILE distill,qlora Training data file
GRAD_ACC_STEPS distill,qlora Gradient accumulation steps
LEARNING_RATE distill,qlora Training learning rate
ALPHA distill Distillation loss weighting
TEMPERATURE distill Distillation temperature
LOAD_IN_4BIT qlora 4-bit quantization (True/False)
LORA_R qlora LoRA rank
TARGET_MODULES qlora Comma-separated target modules

Logits Capture

When LOGITS=y:

  • Assistant responses will include token-level probability data from the model
  • This data includes the top 10 token candidates at each position with their log probabilities

Environment Variables

Variable Description Default
PROMPTGEN_MODEL Model for prompt generation Required
ANSWERGEN_MODEL Model for answer generation Required
TEMPERATURE Creativity level (0.0-1.0) 0.7
TOPICS Comma-separated list of topics Required
AMOUNTS Number of prompts per topic (single or comma-separated) Required
MULTI_PROMPT Use multi-prompt generation? (Y/n) y
MODEL_SPLIT Percentage split for answer models (comma-separated, sum=100) Required for multiple models
LOGITS Use logits for answer generation? (y/n) n
OUTPUT_FILE Output JSON filename conversations.json
BATCH_SIZE Batch size for prompt generation 5
ASYNC_GEN Enable asynchronous generation? (y/n) n
VERBOSE_LOGGING Print request/response bodies n

Usage

Run the script:

python main.py

The tool will:

  1. Check for required environment variables
  2. Prompt for missing values
  3. Generate prompts for each topic
  4. Generate answers for each prompt
  5. Save conversations to specified JSON file

Output Format

Conversations are saved in JSON format:

[
  {
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms",
        "generation_model": "gpt-3.5-turbo"  // New field for prompt model
      },
      {
        "role": "assistant",
        "content": "Quantum computing leverages quantum mechanics to process information...",
        "logprobs": {
          "content": [
            {
              "token": "Quantum",
              "logprob": -0.1,
              "top_logprobs": [
                {"token": "Quantum", "logprob": -0.1},
                {"token": "This", "logprob": -1.2},
                ...
              ]
            }
          ]
        },
        "generation_model": "gpt-4"  // New field for answer model
      }
    ],
    "model": "gpt-4"  // Model used for answer generation in this conversation
  }
]
  • The conversation object now includes a top-level "model" field indicating the answer generation model
  • User messages include "generation_model" showing which model created the prompt

Example

# .env file:
PROMPTGEN_MODEL=gpt-3.5-turbo
ANSWERGEN_MODEL=gpt-4
TOPICS=Python,JavaScript
AMOUNTS=2
BATCH_SIZE=5          # Add this line
ASYNC_GEN=n           # Add this line

# Command:
python main.py

Notes

  • Uses LiteLLM format for API access
  • Check .env.example for configuration reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mypromptgen-0.3.0.tar.gz (29.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mypromptgen-0.3.0-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file mypromptgen-0.3.0.tar.gz.

File metadata

  • Download URL: mypromptgen-0.3.0.tar.gz
  • Upload date:
  • Size: 29.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for mypromptgen-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6d2bd8020fa7aa0d5d1242c8fcce2168d2e9de8f1c5c75b6752e1aa54dbb0607
MD5 e6c9f4bf7b0eac165fd9e2ad37794b8d
BLAKE2b-256 9304bc3a54914d8806f963ddf916a6f77b0722857c5980250a1a5dc73d6d2146

See more details on using hashes here.

File details

Details for the file mypromptgen-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: mypromptgen-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 29.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for mypromptgen-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ff6b8cd233c324b00bcbb7a2f9e118cb7dd60d25ced286ff09e247394b40755
MD5 768b344569d6fdf44c1c4f4e97a7341c
BLAKE2b-256 6277c0ca95a012da4d8c74e9eda5fc5c043439d10a83d513c082ee1ba0daede0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page