PromptCue - Classify and enrich prompts with routing cues for LLM pipelines
Project description
PromptCue — Prompt Intent Classifier for LLM Pipelines
PromptCue classifies the intent behind a natural-language prompt and returns structured routing cues — telling your LLM pipeline, RAG system, or query router not just what the user asked, but how it should be answered: retrieve, reason, compare, enumerate, check recency, or ask for clarification.
How it works
PromptCue uses a cascade classifier:
- Deterministic pass — scores the query against a YAML registry of query types using trigger-phrase matching and vocabulary overlap. Fast, zero ML dependencies, returns immediately when confidence is high.
- Semantic fallback — when the deterministic result is ambiguous or below threshold,
sentence-level embeddings re-score the query against example sentences per type.
Activates automatically when
sentence-transformersis installed, or immediately when you supply your own embed function viaPromptCueConfig(embed_fn=...)(hosted mode — no model loaded by PromptCue).
The result is a Pydantic model (PromptCueQueryObject) carrying the classification, confidence,
scope, routing hints, action directives, and any enrichment you have enabled.
Requirements
- Python 3.13+
- Core dependencies:
pydantic,PyYAML,numpy(always installed) - All ML/NLP components are optional — the package installs and runs without them
- Language: English only. Triggers, examples, and pre-classification detectors (continuation, structure, temporal scope) are all English-specific.
Install
Core install — deterministic classifier only, no ML dependencies:
pip install promptcue
With semantic scoring (sentence-transformers):
pip install "promptcue[semantic]"
Hosted mode — if your application already has an embedding model loaded (e.g. for RAG), pass it via
PromptCueConfig(embed_fn=your_model.encode). PromptCue will use it directly and you do not need[semantic]— no second model is loaded.
With language detection (langdetect):
pip install "promptcue[detection]"
With linguistic enrichment (spaCy):
pip install "promptcue[linguistic]"
python -m spacy download en_core_web_sm
With keyword extraction (KeyBERT):
pip install "promptcue[keywords]"
With everything:
pip install "promptcue[all]"
python -m spacy download en_core_web_sm
Development install (editable, with test and lint tools):
pip install -e ".[dev]"
Production deployment
PromptCue requires semantic scoring to produce production-quality results.
The deterministic-only path (pip install promptcue, no [semantic]) achieves
approximately 40–50% accuracy on naturalistic queries and is not a supported
production configuration — it is suitable for evaluation or development only.
Semantic scoring can be provided in two ways:
- Standalone mode — install
pip install "promptcue[semantic]"and let PromptCue load its ownall-MiniLM-L6-v2model. - Hosted mode — pass an existing embedding function via
PromptCueConfig(embed_fn=...). No[semantic]install required; PromptCue delegates encoding to the caller's model. See Hosted mode.
For standalone mode, every deployment must:
- Install
pip install "promptcue[semantic]". - Pre-download the model before the service starts — not on first query.
- Call
warm_up()(orwarm_up_async()) at startup and gate readiness on it succeeding.
Progress bars from sentence-transformers are disabled by default in standalone mode
(show_progress_bar=False) so server logs stay clean. Set
PromptCueConfig(show_progress_bar=True) only when you explicitly want tqdm batch output.
If the model cannot be loaded, PromptCue raises PromptCueModelLoadError immediately.
It never silently falls back to deterministic-only mode — a misconfigured deployment
fails loudly at startup rather than producing quietly wrong results at query time.
Model cache location
By default the model is stored in HuggingFace's standard cache (~/.cache/huggingface/).
For deployments that cannot rely on the default cache, set the path explicitly:
from pathlib import Path
from promptcue import PromptCueAnalyzer, PromptCueConfig
analyzer = PromptCueAnalyzer(PromptCueConfig(
model_cache_dir=Path('/opt/models')
))
analyzer.warm_up() # raises PromptCueModelLoadError if the model is not at that path
Or via environment variable — no code change required:
export PROMPTCUE_MODEL_CACHE=/opt/models
Hosted mode: reusing an existing embedding model
If your application already has an embedding model loaded — for RAG, document indexing, or
any other purpose — pass its encode function to PromptCueConfig(embed_fn=...). PromptCue
will delegate all vector computation to that function and will never load a model of its own.
from promptcue import PromptCueAnalyzer, PromptCueConfig
# my_embedder is already loaded elsewhere in your application
def my_encode(text: str) -> list[float]:
return my_embedder.encode(text) # or my_embedder.embed_query(text), etc.
config = PromptCueConfig(embed_fn=my_encode) # no model loaded by promptcue
analyzer = PromptCueAnalyzer(config)
# warm_up() is a no-op — the external model is already loaded by the caller
result = analyzer.analyze('How do I configure VPC peering?')
print(result.primary_query_type) # procedure
The type alias PromptCueEmbedFn = Callable[[str], list[float]] is exported from the
package root and can be used to annotate injected functions:
from promptcue import PromptCueEmbedFn
def build_embed_fn(model) -> PromptCueEmbedFn:
return lambda text: model.encode(text)
When to use hosted mode:
- Your application loads
nomic-embed-text-v1.5,BAAI/bge-large-en-v1.5, or any other model for retrieval/RAG and wants to classify queries with the same model — zero extra memory. - You are integrating PromptCue into a service that already manages its own model lifecycle and you want PromptCue to be a pure classifier with no model side-effects.
- You are running in a memory-constrained environment where loading a second model is not acceptable.
Notes:
enable_semantic_scoringis forced toTrueautomatically whenembed_fnis set, even ifsentence-transformersis not installed.- The inject function signature is single-text:
(str) -> list[float]. If your model has a batch API, wrap it:lambda text: model.encode([text])[0]. warm_up()is a no-op.is_loadedreturnsTrueimmediately.
Deployment patterns
| Environment | Model management approach |
|---|---|
| Local dev | Leave model_cache_dir unset — HuggingFace downloads on first warm_up() |
| EC2 / EBS | Pre-download to EBS volume; set HF_HOME=/opt/models or model_cache_dir |
| Lambda (container image) | Bake model into Docker image at build time — required, Lambda /tmp is ephemeral |
| Lambda (EFS mount) | Pre-populate EFS volume; set model_cache_dir=Path('/mnt/models') |
| Docker / CI | Download during image build; volume-mount for local dev |
For Lambda container images, bake the model in at build time:
FROM python:3.11-slim
RUN pip install "promptcue[semantic]"
ENV HF_HOME=/app/models
RUN python -c "from sentence_transformers import SentenceTransformer; \
SentenceTransformer('all-MiniLM-L6-v2')"
Quick start
Basic — no ML dependencies required
from promptcue import PromptCueAnalyzer
analyzer = PromptCueAnalyzer()
result = analyzer.analyze('Compare Aurora and OpenSearch for RAG on AWS')
print(result.primary_query_type) # comparison
print(result.scope) # comparative
print(result.confidence) # 0.9
print(result.routing_hints) # {'needs_retrieval': True, 'needs_reasoning': True, ...}
print(result.action_hints) # {'should_compare': True, ...}
With semantic scoring — requires pip install "promptcue[semantic]"
Semantic scoring is enabled automatically when sentence-transformers is installed.
Call warm_up() at startup to pre-load the model and avoid first-query latency.
from promptcue import PromptCueAnalyzer
analyzer = PromptCueAnalyzer()
analyzer.warm_up() # loads ~90 MB model once; cached after first download
result = analyzer.analyze('Should we use DynamoDB or RDS for a high-read catalog?')
print(result.primary_query_type) # recommendation
print(result.classification_basis) # semantic_similarity
print(result.confidence) # 0.25
With full enrichment
from promptcue import PromptCueAnalyzer, PromptCueConfig
analyzer = PromptCueAnalyzer(PromptCueConfig(
enable_language_detection = True, # requires promptcue[detection]
enable_linguistic_extraction = True, # requires promptcue[linguistic]
enable_keyword_extraction = True, # requires promptcue[keywords]
))
analyzer.warm_up()
result = analyzer.analyze(
'How do I set up a VPC with private subnets and NAT gateway step by step?'
)
print(result.language) # en
print(result.main_verbs) # ['set']
print(result.noun_phrases) # ['a VPC', 'private subnets', 'NAT gateway']
print(result.keywords) # [PromptCueKeyword(text='vpc private subnets', weight=0.72, ...), ...]
print(result.entities) # [] (no named entities in this query)
In an async application
Both .warm_up_async() and .analyze_async() delegate to asyncio.to_thread(),
so they are safe to await in FastAPI handlers or any other async framework without
blocking the event loop.
import asyncio
from promptcue import PromptCueAnalyzer
async def main() -> None:
analyzer = PromptCueAnalyzer()
await analyzer.warm_up_async()
result = await analyzer.analyze_async('Compare option A and option B')
print(result.primary_query_type) # comparison
asyncio.run(main())
With an injected embed function (hosted mode)
Use this when your application already has an embedding model loaded and you want PromptCue
to reuse it rather than loading a second model. No [semantic] extra required.
from promptcue import PromptCueAnalyzer, PromptCueConfig
# Stub — replace with your actual model's encode method
def my_encode(text: str) -> list[float]:
return my_existing_model.embed_query(text)
analyzer = PromptCueAnalyzer(PromptCueConfig(embed_fn=my_encode))
# warm_up() not needed — model is already loaded externally
result = analyzer.analyze('How do I configure VPC peering step by step?')
print(result.primary_query_type) # procedure
Full JSON output
print(result.model_dump_json(indent=2))
Query types
PromptCue ships with a default registry of 12 query types:
| Label | Scope | Description |
|---|---|---|
analysis |
exploratory | Deep evaluation of a system, architecture, or decision |
chitchat |
broad | Social or conversational, not a knowledge query |
comparison |
comparative | Asks to compare two or more options |
coverage |
broad | Broad overview or "tell me everything" request |
generation |
focused | Produce entirely new content from scratch with no existing source to condense |
lookup |
focused | Factual question with a single direct answer |
procedure |
focused | Step-by-step instructions for a task |
recommendation |
focused | Asks for a decision or suggestion given constraints |
summarization |
focused | Condense existing content — provided, referenced, or in-context — into a shorter form |
troubleshooting |
focused | Diagnosing or fixing a problem |
update |
focused | Latest news, releases, or changes |
validation |
focused | Verify or fact-check a specific stated claim, assumption, or belief |
You can replace or extend the registry by pointing PromptCueConfig.registry_path at your
own YAML file — the schema is documented in src/promptcue/data/query_types_en.yaml.
Which field should I use?
PromptCueQueryObject surfaces several dimensions. Use the one that matches what your
pipeline actually needs to decide — you rarely need all of them.
| I need to know... | Use this field | Example values |
|---|---|---|
| What the user is asking for | primary_query_type |
procedure, comparison, lookup |
| How broad or specific the query is | scope |
broad, focused, comparative, exploratory |
| How to structure the LLM response | action_hints |
should_enumerate, should_compare, should_direct_answer |
| Whether to retrieve / reason / check freshness | routing_hints |
needs_retrieval, needs_current_info, needs_reasoning |
| Whether the query mentions time | semantic_hints.mentions_time |
True / False |
| Whether the query requires cross-period analysis | semantic_hints.requires_multi_period_analysis |
True / False |
| Whether the user wants a specific output format | routing_hints['needs_structure'] |
True / False |
| Whether the query continues a prior conversation | is_continuation |
True / False |
| How confident the classifier is | confidence + confidence_band |
0.74, high |
Common patterns:
- Simple LLM router — branch on
primary_query_typealone. Done. - RAG pipeline — use
routing_hints['needs_retrieval']to decide whether to retrieve,routing_hints['needs_current_info']to check freshness, andscopeto decide how many results to fetch (broad → more, focused → fewer). - Response generator — act on
action_hints:should_enumerate→ numbered list,should_compare→ side-by-side table,should_direct_answer→ single concise answer. - Time-aware pipeline — gate temporal aggregation on
semantic_hints.requires_multi_period_analysis. - Structured-output pipeline — detect explicit format requests via
routing_hints['needs_structure']before passing to the generator. - Ambiguity guard — check
confidence_band == 'low'orambiguity_score > 0.5before routing; fall back to clarification when confidence is too low.
The
primary_query_typelabels are intentionally granular (12 types). If you only need coarse routing,scopealready gives you broad / focused / comparative without looking at the type label at all.
Public API
PromptCueAnalyzer
PromptCueAnalyzer(config: PromptCueConfig | None = None)
| Method | Description |
|---|---|
.analyze(text: str) -> PromptCueQueryObject |
Analyze a query and return a structured result |
.warm_up() -> None |
Pre-load all enabled models at startup to avoid first-query latency |
.analyze_async(text: str) -> PromptCueQueryObject |
Async variant of .analyze(); delegates to asyncio.to_thread() |
.warm_up_async() -> None |
Async variant of .warm_up(); delegates to asyncio.to_thread() |
PromptCueConfig fields
| Field | Type | Default | Description |
|---|---|---|---|
registry_path |
Path | None |
None |
Custom YAML registry path; uses bundled default when None |
model_cache_dir |
Path | None |
env / None |
Directory where the sentence-transformers model is cached. Falls back to PROMPTCUE_MODEL_CACHE env var, then HuggingFace default (~/.cache/huggingface/) |
embed_fn |
Callable[[str], list[float]] | None |
None |
Injectable embed function for hosted mode. When set, PromptCue delegates all vector computation to this function and never loads a model. enable_semantic_scoring is forced to True. See Hosted mode |
show_progress_bar |
bool |
False |
Standalone mode only: forwarded to SentenceTransformer.encode(show_progress_bar=...). Keep False for clean logs; set True for local debugging |
similarity_threshold |
float |
0.55 |
Minimum score for a deterministic match to be accepted |
semantic_similarity_threshold |
float |
0.20 |
Minimum score for a semantic match to be accepted |
ambiguity_margin |
float |
0.08 |
Min gap between top-2 scores before clarification is flagged |
semantic_fallback_threshold |
float |
0.75 |
Deterministic score above which the semantic pass is skipped |
trigger_fallback_threshold |
float |
0.60 |
When a trigger phrase matched and the score meets this value and the margin is clear, the deterministic result is trusted directly and semantic is skipped |
enable_semantic_scoring |
bool |
auto | True when sentence-transformers is installed or embed_fn is set, else False |
embedding_model |
str |
all-MiniLM-L6-v2 |
HuggingFace model name for semantic scoring (ignored when embed_fn is set) |
enable_language_detection |
bool |
False |
Detect BCP-47 language code; requires promptcue[detection] |
enable_linguistic_extraction |
bool |
False |
Extract verbs, noun phrases, named entities; requires promptcue[linguistic] |
enable_keyword_extraction |
bool |
False |
Extract keyphrases via KeyBERT; requires promptcue[keywords] |
max_keywords |
int |
8 |
Maximum number of keyphrases to extract |
spacy_model |
str |
en_core_web_sm |
spaCy model name for linguistic extraction |
PromptCueQueryObject fields
| Field | Type | Description |
|---|---|---|
schema_version |
str |
Output schema version ("1.0") |
input_text |
str |
Original query as provided by the caller |
normalized_text |
str |
Unicode-normalised, whitespace-collapsed query |
language |
str |
BCP-47 language code ("en") or "unknown" when detection is off |
is_continuation |
bool |
True when the query continues an ongoing conversation (e.g. "what about X?", "and for Y?") |
primary_query_type |
str |
Top classified query type label, or "unknown" |
classification_basis |
str |
How the result was reached: trigger_match, word_overlap, semantic_similarity, below_threshold |
candidate_query_types |
list[PromptCueCandidate] |
All types ranked by score |
runner_up |
PromptCueCandidate | None |
Second-ranked candidate; None when fewer than two candidates exist |
confidence |
float |
Score of the top candidate (0.0–1.0) |
confidence_band |
str |
Coarse confidence tier: high, medium, or low |
ambiguity_score |
float |
How close the top-2 candidates are (0.0 = clear, 1.0 = identical) |
scope |
str |
Query scope: broad, focused, comparative, exploratory, or unknown |
main_verbs |
list[str] |
Root verbs extracted by spaCy (empty when enrichment is off) |
noun_phrases |
list[str] |
Noun chunks extracted by spaCy (empty when enrichment is off) |
named_entities |
list[str] |
Named entity surface texts, plain strings (backward compat) |
entities |
list[PromptCueEntity] |
Named entities with text and entity_type (spaCy label) |
keywords |
list[PromptCueKeyword] |
Keyphrases with text, weight, and kind from KeyBERT |
routing_hints |
dict[str, bool] |
needs_retrieval, needs_reasoning, needs_current_info, needs_clarification, needs_structure |
semantic_hints |
PromptCueSemanticHints |
Agnostic semantic cues (mentions_multiple_items, requests_comparison, requests_enumeration, requests_structure, mentions_time, requires_multi_period_analysis) |
confidence_meta |
PromptCueConfidenceMeta |
Confidence diagnostics (type_confidence_margin, scope_confidence, scope_confidence_margin) |
explanations |
PromptCueExplanations |
Debug metadata (decision_notes, evidence_tokens) |
action_hints |
dict[str, bool] |
Response-generation directives: should_survey, should_enumerate, should_compare, should_direct_answer, should_check_recency, should_clarify, should_respond_conversationally |
constraints |
list[str] |
Reserved for future use |
Exceptions
All exceptions inherit from PromptCueError.
| Exception | Raised when |
|---|---|
PromptCueError |
Base class — catch this to handle all PromptCue errors |
PromptCueModelLoadError |
The sentence-transformers model cannot be loaded at warm_up() time |
PromptCueRegistryError |
The query type registry YAML is missing, malformed, or contains invalid entries |
Development
git clone https://github.com/informity/promptcue.git
cd promptcue
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,semantic,linguistic,keywords,detection]"
python -m spacy download en_core_web_sm
pytest
ruff check src/ tests/ examples/
Contributing
See CONTRIBUTING.md.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptcue-0.3.0.tar.gz.
File metadata
- Download URL: promptcue-0.3.0.tar.gz
- Upload date:
- Size: 60.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
595e9b0d584e1fe4551b6930ec12dbeae9404e17769d876610b6032603b42428
|
|
| MD5 |
2380157e79e048eb52050972e9040502
|
|
| BLAKE2b-256 |
cc0d5f5c85afca8e9d481ecc7b1be1312a7134dfc2dca56733cbaf602509cd64
|
Provenance
The following attestation bundles were made for promptcue-0.3.0.tar.gz:
Publisher:
publish.yml on informity/promptcue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptcue-0.3.0.tar.gz -
Subject digest:
595e9b0d584e1fe4551b6930ec12dbeae9404e17769d876610b6032603b42428 - Sigstore transparency entry: 1187720426
- Sigstore integration time:
-
Permalink:
informity/promptcue@cfdd48a81c0fef389bd33f0d94dd4b772512d817 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/informity
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cfdd48a81c0fef389bd33f0d94dd4b772512d817 -
Trigger Event:
push
-
Statement type:
File details
Details for the file promptcue-0.3.0-py3-none-any.whl.
File metadata
- Download URL: promptcue-0.3.0-py3-none-any.whl
- Upload date:
- Size: 49.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc616c39ba25fe0363fc766a15be4477098a7cbdabf38e072c939a0650e9ecbb
|
|
| MD5 |
855e6de735a32bb676fcf18644ca6e0c
|
|
| BLAKE2b-256 |
67cc3b82042051a2a4608b6764174e8243fd75c2378a00ca5800a4da703e88de
|
Provenance
The following attestation bundles were made for promptcue-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on informity/promptcue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptcue-0.3.0-py3-none-any.whl -
Subject digest:
fc616c39ba25fe0363fc766a15be4477098a7cbdabf38e072c939a0650e9ecbb - Sigstore transparency entry: 1187720430
- Sigstore integration time:
-
Permalink:
informity/promptcue@cfdd48a81c0fef389bd33f0d94dd4b772512d817 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/informity
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cfdd48a81c0fef389bd33f0d94dd4b772512d817 -
Trigger Event:
push
-
Statement type: