A tool that creates multi-prompt datasets from single-prompt datasets using templates
Project description
MultiPromptify
A tool that creates multi-prompt datasets from single-prompt datasets using templates with variation specifications.
Overview
MultiPromptify transforms your single-prompt datasets into rich multi-prompt datasets by applying various types of variations specified in your templates. It supports HuggingFace-compatible datasets and provides both a command-line interface and a modern web UI.
📚 Documentation
- 📖 Complete API Guide - Python API reference and examples
- 🏗️ Developer Documentation - For contributors and developers
- Project Structure - Code organization guide
- Publishing Guide - Package publishing instructions
- Implementation Summaries - Technical implementation details
Installation
From PyPI (Recommended)
pip install multipromptify
From GitHub (Latest)
pip install git+https://github.com/ehabba/MultiPromptifyPipeline.git
From Source
git clone https://github.com/ehabba/MultiPromptifyPipeline.git
cd MultiPromptifyPipeline
pip install -e .
With Web UI Support
# Install with web UI components
pip install -e ".[ui]"
Quick Start
Web UI (Recommended)
Launch the modern web interface for an intuitive experience:
# From project root
python src/ui/run_streamlit.py
# Or use the demo script
python demo_ui.py
The web UI provides:
- 📁 Step 1: Upload data or use sample datasets
- 🔧 Step 2: Build templates with smart suggestions
- ⚡ Step 3: Generate variations with real-time progress
- 🎉 Step 4: Analyze results and export in multiple formats
Command Line Interface
multipromptify --template "{instruction:semantic}: {col1:paraphrase}" \
--data data.csv \
--instruction "Classify the sentiment"
Python API
Using MultiPromptifyAPI (Recommended)
from multipromptify import MultiPromptifyAPI
import pandas as pd
# Initialize
mp = MultiPromptifyAPI()
# Load data
data = [{"question": "What is 2+2?", "answer": "4"}]
mp.load_dataframe(pd.DataFrame(data))
# Configure template
template = {
'instruction_template': 'Q: {question}\nA: {answer}',
'question': ['surface'],
'gold': 'answer'
}
mp.set_template(template)
# Configure and generate
mp.configure(max_rows=1, variations_per_field=3)
variations = mp.generate(verbose=True)
# Export results
mp.export("output.json", format="json")
Using MultiPromptify (Legacy)
from multipromptify import MultiPromptify
import pandas as pd
# Your data
data = pd.DataFrame({
'question': ['What is 2+2?', 'What color is the sky?'],
'options': ['A)3 B)4 C)5', 'A)Red B)Blue C)Green']
})
# Template with variation specifications
template = "{instruction:semantic}: {few_shot}\n Question: {question:paraphrase}\n Options: {options}"
# Initialize and generate variations
mp = MultiPromptify()
variations = mp.generate_variations(
template=template,
data=data,
instruction="Choose the correct answer",
few_shot=["Example: 1+1=2"]
)
print(f"Generated {len(variations)} prompt variations")
Template Format
Templates use Python f-string syntax with custom variation annotations:
"{instruction:semantic}: {few_shot}\n Question: {question:paraphrase}\n Options: {options:non-semantic}"
Supported variation types:
:semantic- Semantic variations (meaning-preserving):paraphrase- Paraphrasing variations:non-semantic- Non-semantic variations (formatting, etc.):lexical- Word choice variations:syntactic- Sentence structure variations:surface- Surface-level formatting variations
Features
Template System
- Python f-string compatibility: Use familiar
{variable}syntax - Variation annotations: Specify variation types with
:typesyntax - Flexible column mapping: Reference any column from your data
- Literal support: Use static strings and numbers
Input Handling
- CSV/DataFrame support: Direct pandas DataFrame or CSV file input
- HuggingFace datasets: Full compatibility with datasets library
- Dictionary inputs: Support for various input types
- Literals (strings/numbers): Applied to entire dataset
- Lists: Applied per sample/row
- Few-shot examples: Flexible list or tuple formats
Web UI Features
- Sample Datasets: Built-in datasets for quick testing
- Template Suggestions: Smart suggestions based on your data
- Real-time Validation: Instant feedback on template syntax
- Live Preview: Test templates before full generation
- Advanced Analytics: Distribution charts, field analysis
- Search & Filter: Find specific variations quickly
- Multiple Export Formats: JSON, CSV, TXT, and custom formats
Few-shot Examples
# Different examples per sample
few_shot = [
["Example 1 for sample 1", "Example 2 for sample 1"],
["Example 1 for sample 2", "Example 2 for sample 2"]
]
# Same examples for all samples
few_shot = ("Example 1", "Example 2")
Command Line Interface
Basic Commands
# Basic usage
multipromptify --template "{instruction:semantic}: {question:paraphrase}" \
--data data.csv \
--instruction "Answer the question"
# With output file
multipromptify --template "{instruction}: {question:paraphrase}" \
--data data.csv \
--instruction "Answer this" \
--output variations.json
# Specify number of variations
multipromptify --template "{instruction:semantic}: {question}" \
--data data.csv \
--instruction "Solve this" \
--max-variations 50
Advanced Options
# With few-shot examples from file
multipromptify --template "{instruction}: {few_shot}\n{question:paraphrase}" \
--data data.csv \
--instruction "Answer the question" \
--few-shot-file examples.txt \
--few-shot-count 3
# Output to HuggingFace dataset format
multipromptify --template "{instruction:semantic}: {question}" \
--data data.csv \
--instruction "Solve this" \
--output-format hf \
--output dataset_variations/
API Reference
MultiPromptify Class
class MultiPromptify:
def __init__(self, max_variations: int = 100):
"""Initialize MultiPromptify generator."""
def generate_variations(
self,
template: str,
data: Union[pd.DataFrame, str, dict],
instruction: str = None,
few_shot: Union[list, tuple] = None,
**kwargs
) -> List[Dict[str, Any]]:
"""Generate prompt variations based on template."""
def parse_template(self, template: str) -> Dict[str, str]:
"""Parse template to extract columns and variation types."""
def save_variations(
self,
variations: List[Dict[str, Any]],
output_path: str,
format: str = "json"
):
"""Save variations to file."""
Examples
Sentiment Analysis
import pandas as pd
from multipromptify import MultiPromptify
data = pd.DataFrame({
'text': ['I love this movie!', 'This book is terrible.'],
'label': ['positive', 'negative']
})
template = "{instruction:semantic}: '{text:paraphrase}'\nSentiment: {label}"
mp = MultiPromptify()
variations = mp.generate_variations(
template=template,
data=data,
instruction="Classify the sentiment of the following text"
)
Question Answering with Few-shot
template = "{instruction:paraphrase}: {few_shot}\n\nQuestion: {question:semantic}\nAnswer:"
few_shot_examples = [
"Q: What is the capital of France? A: Paris",
"Q: What is 2+2? A: 4"
]
variations = mp.generate_variations(
template=template,
data=qa_data,
instruction="Answer the following question",
few_shot=few_shot_examples
)
Multiple Choice
template = "{instruction:semantic}:\n\n{context:paraphrase}\n\nQuestion: {question}\nOptions:\n{options:non-semantic}\n\nAnswer:"
variations = mp.generate_variations(
template=template,
data=mc_data,
instruction="Choose the best answer"
)
Web UI Screenshots
The MultiPromptify 2.0 web interface provides an intuitive workflow:
- Data Upload: Upload CSV/JSON files or select from sample datasets
- Template Builder: Create templates with smart suggestions and real-time validation
- Generation: Configure settings and watch real-time progress
- Results: Analyze, search, filter, and export your variations
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file multipromptify-2.0.0.tar.gz.
File metadata
- Download URL: multipromptify-2.0.0.tar.gz
- Upload date:
- Size: 89.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
453e8af48fe20247474aefa177c0411b1bcc0e863696703fb06ecf2ac705fd84
|
|
| MD5 |
674aee97a9cf0fb6992a769ffbcfefb8
|
|
| BLAKE2b-256 |
f46671dc9927f0269dfb45dcabcedf06ac17e0111d3cd53abbdebb55ed52281d
|
Provenance
The following attestation bundles were made for multipromptify-2.0.0.tar.gz:
Publisher:
publish.yml on eliyahabba/MultiPromptifyPipeline
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
multipromptify-2.0.0.tar.gz -
Subject digest:
453e8af48fe20247474aefa177c0411b1bcc0e863696703fb06ecf2ac705fd84 - Sigstore transparency entry: 238730180
- Sigstore integration time:
-
Permalink:
eliyahabba/MultiPromptifyPipeline@ae446ff11824ad0d9bce5a7820dd08fa0f7c7e3f -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/eliyahabba
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae446ff11824ad0d9bce5a7820dd08fa0f7c7e3f -
Trigger Event:
push
-
Statement type:
File details
Details for the file multipromptify-2.0.0-py3-none-any.whl.
File metadata
- Download URL: multipromptify-2.0.0-py3-none-any.whl
- Upload date:
- Size: 75.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a0aa3b7a93fea7bcfad5f36938cc6b22298c4c58735409fcf060096454131bb
|
|
| MD5 |
333afff6a27e281aabecfe8fcd7a3c49
|
|
| BLAKE2b-256 |
887c1560fe9b9aa455e9cf45185416623a320814aa845650124d5dc3758f4c72
|
Provenance
The following attestation bundles were made for multipromptify-2.0.0-py3-none-any.whl:
Publisher:
publish.yml on eliyahabba/MultiPromptifyPipeline
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
multipromptify-2.0.0-py3-none-any.whl -
Subject digest:
9a0aa3b7a93fea7bcfad5f36938cc6b22298c4c58735409fcf060096454131bb - Sigstore transparency entry: 238730181
- Sigstore integration time:
-
Permalink:
eliyahabba/MultiPromptifyPipeline@ae446ff11824ad0d9bce5a7820dd08fa0f7c7e3f -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/eliyahabba
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae446ff11824ad0d9bce5a7820dd08fa0f7c7e3f -
Trigger Event:
push
-
Statement type: