Skip to main content

PromptCue - Classify and enrich prompts with routing cues for LLM pipelines

Project description

PromptCue — Prompt Intent Classifier for LLM Pipelines

PyPI version Python versions License: MIT CI

PromptCue classifies the intent behind a natural-language prompt and returns structured routing cues — telling your LLM pipeline, RAG system, or query router not just what the user asked, but how it should be answered: retrieve, reason, compare, enumerate, check recency, or ask for clarification.


How it works

PromptCue uses a cascade classifier:

  1. Deterministic pass — scores the query against a YAML registry of query types using trigger-phrase matching and vocabulary overlap. Fast, zero ML dependencies, returns immediately when confidence is high.
  2. Semantic fallback — when the deterministic result is ambiguous or below threshold, sentence-level embeddings re-score the query against example sentences per type. Activates automatically when sentence-transformers is installed, or immediately when you supply your own embed function via PromptCueConfig(embed_fn=...) (hosted mode — no model loaded by PromptCue).

The result is a Pydantic model (PromptCueQueryObject) carrying the classification, confidence, scope, routing hints, action directives, and any enrichment you have enabled.


Requirements

  • Python 3.13+
  • Core dependencies: pydantic, PyYAML, numpy (always installed)
  • All ML/NLP components are optional — the package installs and runs without them
  • Language: English only. Triggers, examples, and pre-classification detectors (continuation, structure, temporal scope) are all English-specific.

Install

Core install — deterministic classifier only, no ML dependencies:

pip install promptcue

With semantic scoring (sentence-transformers):

pip install "promptcue[semantic]"

Hosted mode — if your application already has an embedding model loaded (e.g. for RAG), pass it via PromptCueConfig(embed_fn=your_model.encode). PromptCue will use it directly and you do not need [semantic] — no second model is loaded.

With language detection (langdetect):

pip install "promptcue[detection]"

With linguistic enrichment (spaCy):

pip install "promptcue[linguistic]"
python -m spacy download en_core_web_sm

With keyword extraction (KeyBERT):

pip install "promptcue[keywords]"

With everything:

pip install "promptcue[all]"
python -m spacy download en_core_web_sm

Development install (editable, with test and lint tools):

pip install -e ".[dev]"

Production deployment

PromptCue requires semantic scoring to produce production-quality results. The deterministic-only path (pip install promptcue, no [semantic]) achieves approximately 40–50% accuracy on naturalistic queries and is not a supported production configuration — it is suitable for evaluation or development only.

Semantic scoring can be provided in two ways:

  • Standalone mode — install pip install "promptcue[semantic]" and let PromptCue load its own all-MiniLM-L6-v2 model.
  • Hosted mode — pass an existing embedding function via PromptCueConfig(embed_fn=...). No [semantic] install required; PromptCue delegates encoding to the caller's model. See Hosted mode.

For standalone mode, every deployment must:

  1. Install pip install "promptcue[semantic]".
  2. Pre-download the model before the service starts — not on first query.
  3. Call warm_up() (or warm_up_async()) at startup and gate readiness on it succeeding.

Progress bars from sentence-transformers are disabled by default in standalone mode (show_progress_bar=False) so server logs stay clean. Set PromptCueConfig(show_progress_bar=True) only when you explicitly want tqdm batch output.

If the model cannot be loaded, PromptCue raises PromptCueModelLoadError immediately. It never silently falls back to deterministic-only mode — a misconfigured deployment fails loudly at startup rather than producing quietly wrong results at query time.

Model cache location

By default the model is stored in HuggingFace's standard cache (~/.cache/huggingface/). For deployments that cannot rely on the default cache, set the path explicitly:

from pathlib import Path
from promptcue import PromptCueAnalyzer, PromptCueConfig

analyzer = PromptCueAnalyzer(PromptCueConfig(
    model_cache_dir=Path('/opt/models')
))
analyzer.warm_up()   # raises PromptCueModelLoadError if the model is not at that path

Or via environment variable — no code change required:

export PROMPTCUE_MODEL_CACHE=/opt/models

Hosted mode: reusing an existing embedding model

If your application already has an embedding model loaded — for RAG, document indexing, or any other purpose — pass its encode function to PromptCueConfig(embed_fn=...). PromptCue will delegate all vector computation to that function and will never load a model of its own.

from promptcue import PromptCueAnalyzer, PromptCueConfig

# my_embedder is already loaded elsewhere in your application
def my_encode(text: str) -> list[float]:
    return my_embedder.encode(text)        # or my_embedder.embed_query(text), etc.

config   = PromptCueConfig(embed_fn=my_encode)   # no model loaded by promptcue
analyzer = PromptCueAnalyzer(config)

# warm_up() is a no-op — the external model is already loaded by the caller
result = analyzer.analyze('How do I configure VPC peering?')
print(result.primary_query_type)   # procedure

The type alias PromptCueEmbedFn = Callable[[str], list[float]] is exported from the package root and can be used to annotate injected functions:

from promptcue import PromptCueEmbedFn

def build_embed_fn(model) -> PromptCueEmbedFn:
    return lambda text: model.encode(text)

When to use hosted mode:

  • Your application loads nomic-embed-text-v1.5, BAAI/bge-large-en-v1.5, or any other model for retrieval/RAG and wants to classify queries with the same model — zero extra memory.
  • You are integrating PromptCue into a service that already manages its own model lifecycle and you want PromptCue to be a pure classifier with no model side-effects.
  • You are running in a memory-constrained environment where loading a second model is not acceptable.

Notes:

  • enable_semantic_scoring is forced to True automatically when embed_fn is set, even if sentence-transformers is not installed.
  • The inject function signature is single-text: (str) -> list[float]. If your model has a batch API, wrap it: lambda text: model.encode([text])[0].
  • warm_up() is a no-op. is_loaded returns True immediately.

Deployment patterns

Environment Model management approach
Local dev Leave model_cache_dir unset — HuggingFace downloads on first warm_up()
EC2 / EBS Pre-download to EBS volume; set HF_HOME=/opt/models or model_cache_dir
Lambda (container image) Bake model into Docker image at build time — required, Lambda /tmp is ephemeral
Lambda (EFS mount) Pre-populate EFS volume; set model_cache_dir=Path('/mnt/models')
Docker / CI Download during image build; volume-mount for local dev

For Lambda container images, bake the model in at build time:

FROM python:3.11-slim
RUN pip install "promptcue[semantic]"
ENV HF_HOME=/app/models
RUN python -c "from sentence_transformers import SentenceTransformer; \
    SentenceTransformer('all-MiniLM-L6-v2')"

Quick start

Basic — no ML dependencies required

from promptcue import PromptCueAnalyzer

analyzer = PromptCueAnalyzer()
result   = analyzer.analyze('Compare Aurora and OpenSearch for RAG on AWS')

print(result.primary_query_type)   # comparison
print(result.scope)                # comparative
print(result.confidence)           # 0.9
print(result.routing_hints)        # {'needs_retrieval': True, 'needs_reasoning': True, ...}
print(result.action_hints)         # {'should_compare': True, ...}

With semantic scoring — requires pip install "promptcue[semantic]"

Semantic scoring is enabled automatically when sentence-transformers is installed. Call warm_up() at startup to pre-load the model and avoid first-query latency.

from promptcue import PromptCueAnalyzer

analyzer = PromptCueAnalyzer()
analyzer.warm_up()  # loads ~90 MB model once; cached after first download

result = analyzer.analyze('Should we use DynamoDB or RDS for a high-read catalog?')
print(result.primary_query_type)   # recommendation
print(result.classification_basis) # semantic_similarity
print(result.confidence)           # 0.25

With full enrichment

from promptcue import PromptCueAnalyzer, PromptCueConfig

analyzer = PromptCueAnalyzer(PromptCueConfig(
    enable_language_detection    = True,   # requires promptcue[detection]
    enable_linguistic_extraction = True,   # requires promptcue[linguistic]
    enable_keyword_extraction    = True,   # requires promptcue[keywords]
))
analyzer.warm_up()

result = analyzer.analyze(
    'How do I set up a VPC with private subnets and NAT gateway step by step?'
)
print(result.language)       # en
print(result.main_verbs)     # ['set']
print(result.noun_phrases)   # ['a VPC', 'private subnets', 'NAT gateway']
print(result.keywords)       # [PromptCueKeyword(text='vpc private subnets', weight=0.72, ...), ...]
print(result.entities)       # []  (no named entities in this query)

In an async application

Both .warm_up_async() and .analyze_async() delegate to asyncio.to_thread(), so they are safe to await in FastAPI handlers or any other async framework without blocking the event loop.

import asyncio
from promptcue import PromptCueAnalyzer

async def main() -> None:
    analyzer = PromptCueAnalyzer()
    await analyzer.warm_up_async()

    result = await analyzer.analyze_async('Compare option A and option B')
    print(result.primary_query_type)   # comparison

asyncio.run(main())

With an injected embed function (hosted mode)

Use this when your application already has an embedding model loaded and you want PromptCue to reuse it rather than loading a second model. No [semantic] extra required.

from promptcue import PromptCueAnalyzer, PromptCueConfig

# Stub — replace with your actual model's encode method
def my_encode(text: str) -> list[float]:
    return my_existing_model.embed_query(text)

analyzer = PromptCueAnalyzer(PromptCueConfig(embed_fn=my_encode))
# warm_up() not needed — model is already loaded externally

result = analyzer.analyze('How do I configure VPC peering step by step?')
print(result.primary_query_type)   # procedure

Full JSON output

print(result.model_dump_json(indent=2))

Query types

PromptCue ships with a default registry of 12 query types:

Label Scope Description
analysis exploratory Deep evaluation of a system, architecture, or decision
chitchat broad Social or conversational, not a knowledge query
comparison comparative Asks to compare two or more options
coverage broad Broad overview or "tell me everything" request
generation focused Produce entirely new content from scratch with no existing source to condense
lookup focused Factual question with a single direct answer
procedure focused Step-by-step instructions for a task
recommendation focused Asks for a decision or suggestion given constraints
summarization focused Condense existing content — provided, referenced, or in-context — into a shorter form
troubleshooting focused Diagnosing or fixing a problem
update focused Latest news, releases, or changes
validation focused Verify or fact-check a specific stated claim, assumption, or belief

You can replace or extend the registry by pointing PromptCueConfig.registry_path at your own YAML file — the schema is documented in src/promptcue/data/query_types_en.yaml.


Which field should I use?

PromptCueQueryObject surfaces several dimensions. Use the one that matches what your pipeline actually needs to decide — you rarely need all of them.

I need to know... Use this field Example values
What the user is asking for primary_query_type procedure, comparison, lookup
How broad or specific the query is scope broad, focused, comparative, exploratory
How to structure the LLM response action_hints should_enumerate, should_compare, should_direct_answer
Whether to retrieve / reason / check freshness routing_hints needs_retrieval, needs_current_info, needs_reasoning
Whether the query mentions time semantic_hints.mentions_time True / False
Whether the query explicitly asks for current/fresh info semantic_hints.explicit_recency True / False
Whether the query requires cross-period analysis semantic_hints.requires_multi_period_analysis True / False
Whether the user wants a specific output format routing_hints['needs_structure'] True / False
Whether the query continues a prior conversation is_continuation True / False
How confident the classifier is confidence + confidence_band 0.74, high

Common patterns:

  • Simple LLM router — branch on primary_query_type alone. Done.
  • RAG pipeline — use routing_hints['needs_retrieval'] to decide whether to retrieve. Use routing_hints['needs_current_info'] (or semantic_hints.explicit_recency) to trigger freshness checks/web search, and scope to decide how many results to fetch (broad → more, focused → fewer).
  • Response generator — act on action_hints: should_enumerate → numbered list, should_compare → side-by-side table, should_direct_answer → single concise answer.
  • Time-aware pipeline — gate temporal aggregation on semantic_hints.requires_multi_period_analysis.
  • Structured-output pipeline — detect explicit format requests via routing_hints['needs_structure'] before passing to the generator.
  • Ambiguity guard — check confidence_band == 'low' or ambiguity_score > 0.5 before routing; fall back to clarification when confidence is too low.

The primary_query_type labels are intentionally granular (12 types). If you only need coarse routing, scope already gives you broad / focused / comparative without looking at the type label at all.


Public API

PromptCueAnalyzer

PromptCueAnalyzer(config: PromptCueConfig | None = None)
Method Description
.analyze(text: str) -> PromptCueQueryObject Analyze a query and return a structured result
.warm_up() -> None Pre-load all enabled models at startup to avoid first-query latency
.analyze_async(text: str) -> PromptCueQueryObject Async variant of .analyze(); delegates to asyncio.to_thread()
.warm_up_async() -> None Async variant of .warm_up(); delegates to asyncio.to_thread()

PromptCueConfig fields

Field Type Default Description
registry_path Path | None None Custom YAML registry path; uses bundled default when None
model_cache_dir Path | None env / None Directory where the sentence-transformers model is cached. Falls back to PROMPTCUE_MODEL_CACHE env var, then HuggingFace default (~/.cache/huggingface/)
embed_fn Callable[[str], list[float]] | None None Injectable embed function for hosted mode. When set, PromptCue delegates all vector computation to this function and never loads a model. enable_semantic_scoring is forced to True. See Hosted mode
show_progress_bar bool False Standalone mode only: forwarded to SentenceTransformer.encode(show_progress_bar=...). Keep False for clean logs; set True for local debugging
similarity_threshold float 0.55 Minimum score for a deterministic match to be accepted
semantic_similarity_threshold float 0.20 Minimum score for a semantic match to be accepted
ambiguity_margin float 0.08 Min gap between top-2 scores before clarification is flagged
semantic_fallback_threshold float 0.75 Deterministic score above which the semantic pass is skipped
trigger_fallback_threshold float 0.60 When a trigger phrase matched and the score meets this value and the margin is clear, the deterministic result is trusted directly and semantic is skipped
enable_semantic_scoring bool auto True when sentence-transformers is installed or embed_fn is set, else False
embedding_model str all-MiniLM-L6-v2 HuggingFace model name for semantic scoring (ignored when embed_fn is set)
enable_language_detection bool False Detect BCP-47 language code; requires promptcue[detection]
enable_linguistic_extraction bool False Extract verbs, noun phrases, named entities; requires promptcue[linguistic]
enable_keyword_extraction bool False Extract keyphrases via KeyBERT; requires promptcue[keywords]
max_keywords int 8 Maximum number of keyphrases to extract
spacy_model str en_core_web_sm spaCy model name for linguistic extraction

PromptCueQueryObject fields

Field Type Description
schema_version str Output schema version ("1.0")
input_text str Original query as provided by the caller
normalized_text str Unicode-normalised, whitespace-collapsed query
language str BCP-47 language code ("en") or "unknown" when detection is off
is_continuation bool True when the query continues an ongoing conversation (e.g. "what about X?", "and for Y?")
primary_query_type str Top classified query type label, or "unknown"
classification_basis str How the result was reached: trigger_match, word_overlap, semantic_similarity, below_threshold
candidate_query_types list[PromptCueCandidate] All types ranked by score
runner_up PromptCueCandidate | None Second-ranked candidate; None when fewer than two candidates exist
confidence float Score of the top candidate (0.0–1.0)
confidence_band str Coarse confidence tier: high, medium, or low
ambiguity_score float How close the top-2 candidates are (0.0 = clear, 1.0 = identical)
scope str Query scope: broad, focused, comparative, exploratory, or unknown
main_verbs list[str] Root verbs extracted by spaCy (empty when enrichment is off)
noun_phrases list[str] Noun chunks extracted by spaCy (empty when enrichment is off)
named_entities list[str] Named entity surface texts, plain strings (backward compat)
entities list[PromptCueEntity] Named entities with text and entity_type (spaCy label)
keywords list[PromptCueKeyword] Keyphrases with text, weight, and kind from KeyBERT
routing_hints dict[str, bool] needs_retrieval, needs_reasoning, needs_current_info, needs_clarification, needs_structure
semantic_hints PromptCueSemanticHints Agnostic semantic cues (mentions_multiple_items, requests_comparison, requests_enumeration, requests_structure, mentions_time, requires_multi_period_analysis)
confidence_meta PromptCueConfidenceMeta Confidence diagnostics (type_confidence_margin, scope_confidence, scope_confidence_margin)
explanations PromptCueExplanations Debug metadata (decision_notes, evidence_tokens)
action_hints dict[str, bool] Response-generation directives: should_survey, should_enumerate, should_compare, should_direct_answer, should_check_recency, should_clarify, should_respond_conversationally
constraints list[str] Reserved for future use

Exceptions

All exceptions inherit from PromptCueError.

Exception Raised when
PromptCueError Base class — catch this to handle all PromptCue errors
PromptCueModelLoadError The sentence-transformers model cannot be loaded at warm_up() time
PromptCueRegistryError The query type registry YAML is missing, malformed, or contains invalid entries

Development

git clone https://github.com/informity/promptcue.git
cd promptcue

python3 -m venv .venv
source .venv/bin/activate

pip install -e ".[dev,semantic,linguistic,keywords,detection]"
python -m spacy download en_core_web_sm

pytest
ruff check src/ tests/ examples/

Contributing

See CONTRIBUTING.md.


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptcue-0.3.3.tar.gz (61.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptcue-0.3.3-py3-none-any.whl (49.5 kB view details)

Uploaded Python 3

File details

Details for the file promptcue-0.3.3.tar.gz.

File metadata

  • Download URL: promptcue-0.3.3.tar.gz
  • Upload date:
  • Size: 61.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptcue-0.3.3.tar.gz
Algorithm Hash digest
SHA256 fe9ce8691cc86f0acf8eb9514d0eb61e762446025aba91c74ecd3fb497af7e1e
MD5 9c99ee2b93a788d93709725b6a1ed099
BLAKE2b-256 44d002da017d52c7d5e785e4923d58223ddbd0405159e6bd523b33ca0ce7fc28

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptcue-0.3.3.tar.gz:

Publisher: publish.yml on informity/promptcue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file promptcue-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: promptcue-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 49.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptcue-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0c6e40a39704af62a4a503e9962d9a9b3e72f222335fab1530612639a4ca1695
MD5 8f08d08e3e9efed868ba0f73a0f20c48
BLAKE2b-256 3af58a0b2523d37a4af7c4a7fd62023355b8e7da94ee04f73fbab19841acac8b

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptcue-0.3.3-py3-none-any.whl:

Publisher: publish.yml on informity/promptcue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page