Skip to main content

promptfit - Modular toolkit for optimizing LLM prompts: estimate token usage, rank by semantic relevance, and compress with LLMs to fit any token budget. Perfect for RAG, few-shot, and instruction-heavy GenAI workflows.

Project description

promptfit

PyPI version GitHub stars



📣 Author

Vedant Laxman Chandore
GitHub

The Core Problem

Modern LLMs (Cohere, OpenAI, Gemini, Anthropic, etc.) are powerful, but their token limits make it hard to fit rich, multi-section prompts—especially for Retrieval-Augmented Generation (RAG), few-shot learning, and instruction-heavy use cases. Developers waste time manually trimming prompts, risking loss of important context, incomplete responses, or costly token overages.

promptfit solves this by automating prompt analysis, compression, and optimization—so you get the most value from every token, every time.


✨ Features

  • 🔢 Token Budget Estimator: Analyze and estimate token usage for prompt templates, sections, and variables—before sending to an LLM.
  • 🧭 Semantic Relevance Scoring: Split prompts into sections, generate embeddings (Cohere), and rank by cosine similarity to your query or task.
  • ✂️ Smart Prompt Pruner: Drop or trim low-salience sections first, keeping only the most relevant content to fit your token budget.
  • ✍️ Paraphrasing Module: Use Cohere’s LLM to rewrite and compress over-budget prompts, preserving key instructions and meaning.
  • 📦 Modular Design: Each feature is a standalone module—use them independently or together.
  • 🧪 Test-Driven: Comprehensive unit tests with mocked or live Cohere API responses.
  • 🔐 Secure API Key Handling: Loads your Cohere API key from a .env file or environment variable.
  • 🖥️ CLI Support: Optimize prompts directly from the command line.

🛠️ Tech Stack

  • Language: Python 3.10+
  • LLM: Cohere command-r-plus
  • Embeddings: embed-english-v3.0
  • Tokenizer: Cohere’s estimator (or manual fallback)
  • Libraries:
    • cohere, scikit-learn, tiktoken, python-dotenv, nltk/spacy, rich, typer, pytest

📦 Installation

pip install promptfit

🚀 Demo Usage

Python API

import os
import time
# 1. Set your Cohere key (or have it in your env)
os.environ["COHERE_API_KEY"] = "Kwi33HNnmXRDCkO4j7FndNP3LATOoKX3yvoOdztK"

from promptfit.token_budget import estimate_tokens
from promptfit.utils import split_sentences
from promptfit.embedder import get_embeddings
from promptfit.relevance import compute_cosine_similarities
from promptfit.optimizer import optimize_prompt

# 2. Your actual long prompt
prompt = """
You are a customer-support assistant. A user reports that their device fails
intermittently under cold conditions, the battery drains within two hours, and
previous support tickets went unanswered. They’ve provided logs and screenshots.
Please summarize the issues, note their emotional tone, propose immediate
fixes, and suggest long-term retention strategies.
"""

query = "Summarize issues, emotional tone, action items, and retention strategies."

print("=== PROMPT ===")
print(prompt)

# 3. Split into sentences
sentences = split_sentences(prompt)
print("=== SENTENCES ===")
for i, s in enumerate(sentences, 1):
    print(f"{i}: {s!r}")
print()

# 4. Compute embeddings (first the query, then the sentences)
all_texts = [query] + sentences
embs = get_embeddings(all_texts)

# 5. Compute cosine similarities between query and each sentence
query_emb = embs[0]
sent_embs = embs[1:]
scores = compute_cosine_similarities(query_emb, sent_embs)

# 6. Display relevance scores
print("=== RELEVANCE SCORES OF COSINE SIMILARITY===")
for sent, score in zip(sentences, scores):
    print(f"{score:.4f} – {sent!r}")
print()

# 7. Show total token count before optimization
orig_tokens = estimate_tokens(prompt)
print(f"Original prompt ≈ {orig_tokens} tokens\n")

# 8. Run optimizer with timing
budget = 40
start_time = time.time()
optimized = optimize_prompt(prompt, query, max_tokens=budget)
end_time = time.time()
opt_tokens = estimate_tokens(optimized)

tokens_saved = orig_tokens - opt_tokens
percent_saved = (tokens_saved / orig_tokens) * 100

# 9. Output final result with efficiency stats
print(f"Optimized prompt ({opt_tokens} tokens ≤ {budget} budget):\n")
print(optimized)
print("\n--- Efficiency Stats ---")
print(f"Token reduction: {orig_tokens - opt_tokens} tokens")
print(f"Reduction percentage: {(orig_tokens - opt_tokens) / orig_tokens * 100:.1f}%")
print(f"Optimization time: {end_time - start_time:.2f} seconds")
print(f"Tokens Saved: {tokens_saved}")

Command Line

python -m promptfit.cli "YOUR_PROMPT" "YOUR_QUERY" --max-tokens 120

Full Demo Script

See demo/demo_usage.py for a comprehensive example covering:

  • Token estimation
  • Embedding generation
  • Relevance ranking
  • Pruning and paraphrasing
  • End-to-end optimization

🗝️ Environment Setup

  • Store your COHERE_API_KEY in a .env file in your project root:
    COHERE_API_KEY=your-real-api-key-here
    
  • Or set it in your shell:
    export COHERE_API_KEY=your-real-api-key-here
    

📚 Directory Structure

promptfit/
│
├── __init__.py
├── token_budget.py
├── embedder.py
├── relevance.py
├── optimizer.py
├── paraphraser.py
├── cli.py
├── utils.py
├── config.py
│
├── README.md
├── requirements.txt
│
├── tests/
│   ├── test_token_budget.py
│   ├── test_relevance.py
│   ├── test_optimizer.py
│   └── test_paraphraser.py
│
└── demo/
    └── demo_usage.py

💡 Why Use promptfit?

  • Save tokens, save money: Only send the most relevant, concise prompts to your LLM.
  • Prevent errors: Never exceed token limits or lose critical context.
  • Automate prompt engineering: Focus on your app, not manual prompt trimming.
  • Works with any LLM: Designed for Cohere, but easily adaptable to OpenAI, Gemini, Anthropic, and more.

📝 License

MIT License


🤝 Contributing

Pull requests, issues, and stars are welcome! For major changes, please open an issue first to discuss what you’d like to change.


📣 Author

Vedant Laxman Chandore
GitHub


<<<<<<<

Built for the next generation of GenAI and LLM developers. Optimize your prompts, maximize your results!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptfit-0.3.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptfit-0.3.0-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file promptfit-0.3.0.tar.gz.

File metadata

  • Download URL: promptfit-0.3.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for promptfit-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c22d472566e311c6ea6e5a1104f0b8debc5f9b9a0326cde3acd1f55a2a739af6
MD5 26c7a3e9249281e987c94f7cf7c9e6be
BLAKE2b-256 5ad36a95817bfbc264286408afaf3c85fb68d63e163c595b18d4e12da4e1906f

See more details on using hashes here.

File details

Details for the file promptfit-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: promptfit-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for promptfit-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af8dc03ed944b8981f8cef09c66c121582fa7860a79b17da5c91a03df7ef87ea
MD5 a2b2ac42d00131d188d1f26ce540281e
BLAKE2b-256 edd30e60b431b7c27138b01f802b84f2a1caa95a1ea77035ccbdd80cb88639e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page