Package for generating AI prompts and answers
Project description
Prompt and Answer Generation Tool
This tool generates structured conversations (prompts and answers) based on specified topics using language models.
Requirements
- Requires LiteLLM for API access
Features
- Generate multiple prompts per topic
- Generate assistant answers for each prompt
- Save conversations to JSON file
- Environment variables for configuration
- Interactive prompts for missing values
- Batch processing for prompt/answer generation
- Asynchronous API calls for improved performance
Setup
- Clone repository:
git clone <repository-url>
cd <repository-name>
- Create virtual environment (optional but recommended):
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# or
.venv\Scripts\activate # Windows
- Install dependencies:
pip install -r requirements.txt
Configuration
- Create
.envfile from example:
cp .env.example .env
- Configure
.envfile:
# Prompt/Answer Generation (.env)
PROMPTGEN_MODEL=gpt-3.5-turbo
ANSWERGEN_MODEL=gpt-4-turbo
TOPICS=Python,JavaScript,AI
AMOUNTS=5,3,2
MODEL_SPLIT=70,30
TEMPERATURE=0.7
MULTI_PROMPT=y
LOGITS=y
OUTPUT_FILE=conversations.json
BATCH_SIZE=5
ASYNC_GEN=y
# Distillation (distill.py)
MODEL_NAME=meta-llama/Llama-2-7b-hf
DATA_FILE=conversations.json
BATCH_SIZE=4
GRAD_ACC_STEPS=4
LEARNING_RATE=2e-5
ALPHA=0.7
TEMPERATURE=2.0
# QLoRA (qlora.py)
MODEL_NAME=unsloth/Qwen2.5-Coder-1.5B-Instruct
MAX_SEQ_LENGTH=2048
LOAD_IN_4BIT=True
LORA_R=16
TARGET_MODULES=q_proj,k_proj,v_proj
- Set API keys (in shell or
.env):
export OPENAI_API_KEY=sk-xxx # For OpenAI
export ANTHROPIC_API_KEY=sk-xxx # For Claude
# See https://litellm.vercel.app/docs/providers for other providers
Usage Commands
- Generate prompts/answers:
python -m mypromptgen.main
- Perform migration (convert conversation file formats):
python migrate.py input.json
# Creates input.json.bak and updates input.json
- Run distillation training:
python distill.py
# Outputs: distill_output/
- Run QLoRA training:
python qlora.py
# Outputs: outputs/
# Saves merged model: unsloth_final_model/
- Build and publish to PyPI:
# Clean previous builds
rm -rf build dist *.egg-info
# Install build tools
pip install --upgrade setuptools build twine
# Create distribution
python -m build
# Upload to PyPI
twine upload dist/*
Note: You'll need a PyPI account and
.pypircfile configured with your credentials
Environment Variable Reference
| Variable | Scope | Description |
|---|---|---|
PROMPTGEN_MODEL |
main | Prompt gen model (e.g., gpt-3.5-turbo) |
ANSWERGEN_MODEL |
main | Comma-separated answer gen models |
MODEL_SPLIT |
main | Percentage split for answer models |
TOPICS |
main | Comma-separated topics |
AMOUNTS |
main | Prompt counts per topic |
TEMPERATURE |
main | Creativity level (0.0-1.0) |
MULTI_PROMPT |
main | Multi-prompt generation (y/n) |
LOGITS |
main | Capture log probabilities (y/n) |
OUTPUT_FILE |
main | JSON output filename |
BATCH_SIZE |
main,distill | Generation batch size |
ASYNC_GEN |
main | Parallel API calls (y/n) |
MODEL_NAME |
distill,qlora | Base model for training |
DATA_FILE |
distill,qlora | Training data file |
GRAD_ACC_STEPS |
distill,qlora | Gradient accumulation steps |
LEARNING_RATE |
distill,qlora | Training learning rate |
ALPHA |
distill | Distillation loss weighting |
TEMPERATURE |
distill | Distillation temperature |
LOAD_IN_4BIT |
qlora | 4-bit quantization (True/False) |
LORA_R |
qlora | LoRA rank |
TARGET_MODULES |
qlora | Comma-separated target modules |
Logits Capture
When LOGITS=y:
- Assistant responses will include token-level probability data from the model
- This data includes the top 10 token candidates at each position with their log probabilities
Environment Variables
| Variable | Description | Default |
|---|---|---|
PROMPTGEN_MODEL |
Model for prompt generation | Required |
ANSWERGEN_MODEL |
Model for answer generation | Required |
TEMPERATURE |
Creativity level (0.0-1.0) | 0.7 |
TOPICS |
Comma-separated list of topics | Required |
AMOUNTS |
Number of prompts per topic (single or comma-separated) | Required |
MULTI_PROMPT |
Use multi-prompt generation? (Y/n) | y |
MODEL_SPLIT |
Percentage split for answer models (comma-separated, sum=100) | Required for multiple models |
LOGITS |
Use logits for answer generation? (y/n) | n |
OUTPUT_FILE |
Output JSON filename | conversations.json |
BATCH_SIZE |
Batch size for prompt generation | 5 |
ASYNC_GEN |
Enable asynchronous generation? (y/n) | n |
VERBOSE_LOGGING |
Print request/response bodies | n |
Usage
Run the script:
python main.py
The tool will:
- Check for required environment variables
- Prompt for missing values
- Generate prompts for each topic
- Generate answers for each prompt
- Save conversations to specified JSON file
Output Format
Conversations are saved in JSON format:
[
{
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms",
"generation_model": "gpt-3.5-turbo" // New field for prompt model
},
{
"role": "assistant",
"content": "Quantum computing leverages quantum mechanics to process information...",
"logprobs": {
"content": [
{
"token": "Quantum",
"logprob": -0.1,
"top_logprobs": [
{"token": "Quantum", "logprob": -0.1},
{"token": "This", "logprob": -1.2},
...
]
}
]
},
"generation_model": "gpt-4" // New field for answer model
}
],
"model": "gpt-4" // Model used for answer generation in this conversation
}
]
- The conversation object now includes a top-level "model" field indicating the answer generation model
- User messages include "generation_model" showing which model created the prompt
Example
# .env file:
PROMPTGEN_MODEL=gpt-3.5-turbo
ANSWERGEN_MODEL=gpt-4
TOPICS=Python,JavaScript
AMOUNTS=2
BATCH_SIZE=5 # Add this line
ASYNC_GEN=n # Add this line
# Command:
python main.py
Notes
- Uses LiteLLM format for API access
- Check
.env.examplefor configuration reference
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mypromptgen-0.3.0.tar.gz.
File metadata
- Download URL: mypromptgen-0.3.0.tar.gz
- Upload date:
- Size: 29.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d2bd8020fa7aa0d5d1242c8fcce2168d2e9de8f1c5c75b6752e1aa54dbb0607
|
|
| MD5 |
e6c9f4bf7b0eac165fd9e2ad37794b8d
|
|
| BLAKE2b-256 |
9304bc3a54914d8806f963ddf916a6f77b0722857c5980250a1a5dc73d6d2146
|
File details
Details for the file mypromptgen-0.3.0-py3-none-any.whl.
File metadata
- Download URL: mypromptgen-0.3.0-py3-none-any.whl
- Upload date:
- Size: 29.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ff6b8cd233c324b00bcbb7a2f9e118cb7dd60d25ced286ff09e247394b40755
|
|
| MD5 |
768b344569d6fdf44c1c4f4e97a7341c
|
|
| BLAKE2b-256 |
6277c0ca95a012da4d8c74e9eda5fc5c043439d10a83d513c082ee1ba0daede0
|