promptfit - Modular toolkit for optimizing LLM prompts: estimate token usage, rank by semantic relevance, and compress with LLMs to fit any token budget. Perfect for RAG, few-shot, and instruction-heavy GenAI workflows.
Project description
promptfit
📣 Author
Vedant Laxman Chandore
GitHub
The Core Problem
Modern LLMs (Cohere, OpenAI, Gemini, Anthropic, etc.) are powerful, but their token limits make it hard to fit rich, multi-section prompts—especially for Retrieval-Augmented Generation (RAG), few-shot learning, and instruction-heavy use cases. Developers waste time manually trimming prompts, risking loss of important context, incomplete responses, or costly token overages.
promptfit solves this by automating prompt analysis, compression, and optimization—so you get the most value from every token, every time.
✨ Features
- 🔢 Token Budget Estimator: Analyze and estimate token usage for prompt templates, sections, and variables—before sending to an LLM.
- 🧭 Semantic Relevance Scoring: Split prompts into sections, generate embeddings (Cohere), and rank by cosine similarity to your query or task.
- ✂️ Smart Prompt Pruner: Drop or trim low-salience sections first, keeping only the most relevant content to fit your token budget.
- ✍️ Paraphrasing Module: Use Cohere’s LLM to rewrite and compress over-budget prompts, preserving key instructions and meaning.
- 📦 Modular Design: Each feature is a standalone module—use them independently or together.
- 🧪 Test-Driven: Comprehensive unit tests with mocked or live Cohere API responses.
- 🔐 Secure API Key Handling: Loads your Cohere API key from a
.envfile or environment variable. - 🖥️ CLI Support: Optimize prompts directly from the command line.
🛠️ Tech Stack
- Language: Python 3.10+
- LLM: Cohere command-r-plus
- Embeddings: embed-english-v3.0
- Tokenizer: Cohere’s estimator (or manual fallback)
- Libraries:
cohere,scikit-learn,tiktoken,python-dotenv,nltk/spacy,rich,typer,pytest
📦 Installation
pip install promptfit
🚀 Demo Usage
Python API
import os
import time
# 1. Set your Cohere key (or have it in your env)
os.environ["COHERE_API_KEY"] = "Kwi33HNnmXRDCkO4j7FndNP3LATOoKX3yvoOdztK"
from promptfit.token_budget import estimate_tokens
from promptfit.utils import split_sentences
from promptfit.embedder import get_embeddings
from promptfit.relevance import compute_cosine_similarities
from promptfit.optimizer import optimize_prompt
# 2. Your actual long prompt
prompt = """
You are a customer-support assistant. A user reports that their device fails
intermittently under cold conditions, the battery drains within two hours, and
previous support tickets went unanswered. They’ve provided logs and screenshots.
Please summarize the issues, note their emotional tone, propose immediate
fixes, and suggest long-term retention strategies.
"""
query = "Summarize issues, emotional tone, action items, and retention strategies."
print("=== PROMPT ===")
print(prompt)
# 3. Split into sentences
sentences = split_sentences(prompt)
print("=== SENTENCES ===")
for i, s in enumerate(sentences, 1):
print(f"{i}: {s!r}")
print()
# 4. Compute embeddings (first the query, then the sentences)
all_texts = [query] + sentences
embs = get_embeddings(all_texts)
# 5. Compute cosine similarities between query and each sentence
query_emb = embs[0]
sent_embs = embs[1:]
scores = compute_cosine_similarities(query_emb, sent_embs)
# 6. Display relevance scores
print("=== RELEVANCE SCORES OF COSINE SIMILARITY===")
for sent, score in zip(sentences, scores):
print(f"{score:.4f} – {sent!r}")
print()
# 7. Show total token count before optimization
orig_tokens = estimate_tokens(prompt)
print(f"Original prompt ≈ {orig_tokens} tokens\n")
# 8. Run optimizer with timing
budget = 40
start_time = time.time()
optimized = optimize_prompt(prompt, query, max_tokens=budget)
end_time = time.time()
opt_tokens = estimate_tokens(optimized)
tokens_saved = orig_tokens - opt_tokens
percent_saved = (tokens_saved / orig_tokens) * 100
# 9. Output final result with efficiency stats
print(f"Optimized prompt ({opt_tokens} tokens ≤ {budget} budget):\n")
print(optimized)
print("\n--- Efficiency Stats ---")
print(f"Token reduction: {orig_tokens - opt_tokens} tokens")
print(f"Reduction percentage: {(orig_tokens - opt_tokens) / orig_tokens * 100:.1f}%")
print(f"Optimization time: {end_time - start_time:.2f} seconds")
print(f"Tokens Saved: {tokens_saved}")
Command Line
python -m promptfit.cli "YOUR_PROMPT" "YOUR_QUERY" --max-tokens 120
Full Demo Script
See demo/demo_usage.py for a comprehensive example covering:
- Token estimation
- Embedding generation
- Relevance ranking
- Pruning and paraphrasing
- End-to-end optimization
🗝️ Environment Setup
- Store your
COHERE_API_KEYin a.envfile in your project root:COHERE_API_KEY=your-real-api-key-here - Or set it in your shell:
export COHERE_API_KEY=your-real-api-key-here
📚 Directory Structure
promptfit/
│
├── __init__.py
├── token_budget.py
├── embedder.py
├── relevance.py
├── optimizer.py
├── paraphraser.py
├── cli.py
├── utils.py
├── config.py
│
├── README.md
├── requirements.txt
│
├── tests/
│ ├── test_token_budget.py
│ ├── test_relevance.py
│ ├── test_optimizer.py
│ └── test_paraphraser.py
│
└── demo/
└── demo_usage.py
💡 Why Use promptfit?
- Save tokens, save money: Only send the most relevant, concise prompts to your LLM.
- Prevent errors: Never exceed token limits or lose critical context.
- Automate prompt engineering: Focus on your app, not manual prompt trimming.
- Works with any LLM: Designed for Cohere, but easily adaptable to OpenAI, Gemini, Anthropic, and more.
📝 License
MIT License
🤝 Contributing
Pull requests, issues, and stars are welcome! For major changes, please open an issue first to discuss what you’d like to change.
📣 Author
Vedant Laxman Chandore
GitHub
<<<<<<<
Built for the next generation of GenAI and LLM developers. Optimize your prompts, maximize your results!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptfit-0.3.0.tar.gz.
File metadata
- Download URL: promptfit-0.3.0.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c22d472566e311c6ea6e5a1104f0b8debc5f9b9a0326cde3acd1f55a2a739af6
|
|
| MD5 |
26c7a3e9249281e987c94f7cf7c9e6be
|
|
| BLAKE2b-256 |
5ad36a95817bfbc264286408afaf3c85fb68d63e163c595b18d4e12da4e1906f
|
File details
Details for the file promptfit-0.3.0-py3-none-any.whl.
File metadata
- Download URL: promptfit-0.3.0-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af8dc03ed944b8981f8cef09c66c121582fa7860a79b17da5c91a03df7ef87ea
|
|
| MD5 |
a2b2ac42d00131d188d1f26ce540281e
|
|
| BLAKE2b-256 |
edd30e60b431b7c27138b01f802b84f2a1caa95a1ea77035ccbdd80cb88639e3
|