PromptCue - Classify and enrich prompts with routing cues for LLM pipelines

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

PromptCue — Prompt Intent Classifier for LLM Pipelines

PromptCue classifies the intent behind a natural-language prompt and returns structured routing cues — telling your LLM pipeline, RAG system, or query router not just what the user asked, but how it should be answered: retrieve, reason, compare, enumerate, check recency, or ask for clarification.

How it works

PromptCue uses a cascade classifier:

Deterministic pass — scores the query against a YAML registry of query types using trigger-phrase matching and vocabulary overlap. Fast, zero ML dependencies, returns immediately when confidence is high.
Semantic fallback — when the deterministic result is ambiguous or below threshold, sentence-level embeddings re-score the query against example sentences per type. Activates automatically when sentence-transformers is installed, or immediately when you supply your own embed function via PromptCueConfig(embed_fn=...) (hosted mode — no model loaded by PromptCue).

The result is a Pydantic model (PromptCueQueryObject) carrying the classification, confidence, scope, routing hints, action directives, and any enrichment you have enabled.

Requirements

Python 3.13+
Core dependencies: pydantic, PyYAML, numpy (always installed)
All ML/NLP components are optional — the package installs and runs without them
Language: English only. Triggers, examples, and pre-classification detectors (continuation, structure, temporal scope) are all English-specific.

Install

Core install — deterministic classifier only, no ML dependencies:

pip install promptcue

With semantic scoring (sentence-transformers):

pip install "promptcue[semantic]"

Hosted mode — if your application already has an embedding model loaded (e.g. for RAG), pass it via PromptCueConfig(embed_fn=your_model.encode). PromptCue will use it directly and you do not need [semantic] — no second model is loaded.

With language detection (langdetect):

pip install "promptcue[detection]"

With linguistic enrichment (spaCy):

pip install "promptcue[linguistic]"
python -m spacy download en_core_web_sm

With keyword extraction (KeyBERT):

pip install "promptcue[keywords]"

With everything:

pip install "promptcue[all]"
python -m spacy download en_core_web_sm

Development install (editable, with test and lint tools):

pip install -e ".[dev]"

Production deployment

PromptCue requires semantic scoring to produce production-quality results. The deterministic-only path (pip install promptcue, no [semantic]) achieves approximately 40–50% accuracy on naturalistic queries and is not a supported production configuration — it is suitable for evaluation or development only.

Semantic scoring can be provided in two ways:

Standalone mode — install pip install "promptcue[semantic]" and let PromptCue load its own all-MiniLM-L6-v2 model.
Hosted mode — pass an existing embedding function via PromptCueConfig(embed_fn=...). No [semantic] install required; PromptCue delegates encoding to the caller's model. See Hosted mode.

For standalone mode, every deployment must:

Install pip install "promptcue[semantic]".
Pre-download the model before the service starts — not on first query.
Call warm_up() (or warm_up_async()) at startup and gate readiness on it succeeding.

Progress bars from sentence-transformers are disabled by default in standalone mode (show_progress_bar=False) so server logs stay clean. Set PromptCueConfig(show_progress_bar=True) only when you explicitly want tqdm batch output.

If the model cannot be loaded, PromptCue raises PromptCueModelLoadError immediately. It never silently falls back to deterministic-only mode — a misconfigured deployment fails loudly at startup rather than producing quietly wrong results at query time.

Model cache location

By default the model is stored in HuggingFace's standard cache (~/.cache/huggingface/). For deployments that cannot rely on the default cache, set the path explicitly:

from pathlib import Path
from promptcue import PromptCueAnalyzer, PromptCueConfig

analyzer = PromptCueAnalyzer(PromptCueConfig(
    model_cache_dir=Path('/opt/models')
))
analyzer.warm_up()   # raises PromptCueModelLoadError if the model is not at that path

Or via environment variable — no code change required:

export PROMPTCUE_MODEL_CACHE=/opt/models

Hosted mode: reusing an existing embedding model

If your application already has an embedding model loaded — for RAG, document indexing, or any other purpose — pass its encode function to PromptCueConfig(embed_fn=...). PromptCue will delegate all vector computation to that function and will never load a model of its own.

from promptcue import PromptCueAnalyzer, PromptCueConfig

# my_embedder is already loaded elsewhere in your application
def my_encode(text: str) -> list[float]:
    return my_embedder.encode(text)        # or my_embedder.embed_query(text), etc.

config   = PromptCueConfig(embed_fn=my_encode)   # no model loaded by promptcue
analyzer = PromptCueAnalyzer(config)

# warm_up() is a no-op — the external model is already loaded by the caller
result = analyzer.analyze('How do I configure VPC peering?')
print(result.primary_query_type)   # procedure

The type alias PromptCueEmbedFn = Callable[[str], list[float]] is exported from the package root and can be used to annotate injected functions:

from promptcue import PromptCueEmbedFn

def build_embed_fn(model) -> PromptCueEmbedFn:
    return lambda text: model.encode(text)

When to use hosted mode:

Your application loads nomic-embed-text-v1.5, BAAI/bge-large-en-v1.5, or any other model for retrieval/RAG and wants to classify queries with the same model — zero extra memory.
You are integrating PromptCue into a service that already manages its own model lifecycle and you want PromptCue to be a pure classifier with no model side-effects.
You are running in a memory-constrained environment where loading a second model is not acceptable.

Notes:

enable_semantic_scoring is forced to True automatically when embed_fn is set, even if sentence-transformers is not installed.
The inject function signature is single-text: (str) -> list[float]. If your model has a batch API, wrap it: lambda text: model.encode([text])[0].
warm_up() is a no-op. is_loaded returns True immediately.

Deployment patterns

Environment	Model management approach
Local dev	Leave `model_cache_dir` unset — HuggingFace downloads on first `warm_up()`
EC2 / EBS	Pre-download to EBS volume; set `HF_HOME=/opt/models` or `model_cache_dir`
Lambda (container image)	Bake model into Docker image at build time — required, Lambda `/tmp` is ephemeral
Lambda (EFS mount)	Pre-populate EFS volume; set `model_cache_dir=Path('/mnt/models')`
Docker / CI	Download during image build; volume-mount for local dev

For Lambda container images, bake the model in at build time:

FROM python:3.11-slim
RUN pip install "promptcue[semantic]"
ENV HF_HOME=/app/models
RUN python -c "from sentence_transformers import SentenceTransformer; \
    SentenceTransformer('all-MiniLM-L6-v2')"

Quick start

Basic — no ML dependencies required

from promptcue import PromptCueAnalyzer

analyzer = PromptCueAnalyzer()
result   = analyzer.analyze('Compare Aurora and OpenSearch for RAG on AWS')

print(result.primary_query_type)   # comparison
print(result.scope)                # comparative
print(result.confidence)           # 0.9
print(result.routing_hints)        # {'needs_retrieval': True, 'needs_reasoning': True, ...}
print(result.action_hints)         # {'should_compare': True, ...}

With semantic scoring — requires `pip install "promptcue[semantic]"`

Semantic scoring is enabled automatically when sentence-transformers is installed. Call warm_up() at startup to pre-load the model and avoid first-query latency.

from promptcue import PromptCueAnalyzer

analyzer = PromptCueAnalyzer()
analyzer.warm_up()  # loads ~90 MB model once; cached after first download

result = analyzer.analyze('Should we use DynamoDB or RDS for a high-read catalog?')
print(result.primary_query_type)   # recommendation
print(result.classification_basis) # semantic_similarity
print(result.confidence)           # 0.25

With full enrichment

from promptcue import PromptCueAnalyzer, PromptCueConfig

analyzer = PromptCueAnalyzer(PromptCueConfig(
    enable_language_detection    = True,   # requires promptcue[detection]
    enable_linguistic_extraction = True,   # requires promptcue[linguistic]
    enable_keyword_extraction    = True,   # requires promptcue[keywords]
))
analyzer.warm_up()

result = analyzer.analyze(
    'How do I set up a VPC with private subnets and NAT gateway step by step?'
)
print(result.language)       # en
print(result.main_verbs)     # ['set']
print(result.noun_phrases)   # ['a VPC', 'private subnets', 'NAT gateway']
print(result.keywords)       # [PromptCueKeyword(text='vpc private subnets', weight=0.72, ...), ...]
print(result.entities)       # []  (no named entities in this query)

In an async application

Both .warm_up_async() and .analyze_async() delegate to asyncio.to_thread(), so they are safe to await in FastAPI handlers or any other async framework without blocking the event loop.

import asyncio
from promptcue import PromptCueAnalyzer

async def main() -> None:
    analyzer = PromptCueAnalyzer()
    await analyzer.warm_up_async()

    result = await analyzer.analyze_async('Compare option A and option B')
    print(result.primary_query_type)   # comparison

asyncio.run(main())

With an injected embed function (hosted mode)

Use this when your application already has an embedding model loaded and you want PromptCue to reuse it rather than loading a second model. No [semantic] extra required.

from promptcue import PromptCueAnalyzer, PromptCueConfig

# Stub — replace with your actual model's encode method
def my_encode(text: str) -> list[float]:
    return my_existing_model.embed_query(text)

analyzer = PromptCueAnalyzer(PromptCueConfig(embed_fn=my_encode))
# warm_up() not needed — model is already loaded externally

result = analyzer.analyze('How do I configure VPC peering step by step?')
print(result.primary_query_type)   # procedure

Full JSON output

print(result.model_dump_json(indent=2))

Query types

PromptCue ships with a default registry of 12 query types:

Label	Scope	Description
`analysis`	exploratory	Deep evaluation of a system, architecture, or decision
`chitchat`	broad	Social or conversational, not a knowledge query
`comparison`	comparative	Asks to compare two or more options
`coverage`	broad	Broad overview or "tell me everything" request
`generation`	focused	Produce entirely new content from scratch with no existing source to condense
`lookup`	focused	Factual question with a single direct answer
`procedure`	focused	Step-by-step instructions for a task
`recommendation`	focused	Asks for a decision or suggestion given constraints
`summarization`	focused	Condense existing content — provided, referenced, or in-context — into a shorter form
`troubleshooting`	focused	Diagnosing or fixing a problem
`update`	focused	Latest news, releases, or changes
`validation`	focused	Verify or fact-check a specific stated claim, assumption, or belief

You can replace or extend the registry by pointing PromptCueConfig.registry_path at your own YAML file — the schema is documented in src/promptcue/data/query_types_en.yaml.

Which field should I use?

PromptCueQueryObject surfaces several dimensions. Use the one that matches what your pipeline actually needs to decide — you rarely need all of them.

I need to know...	Use this field	Example values
What the user is asking for	`primary_query_type`	`procedure`, `comparison`, `lookup`
How broad or specific the query is	`scope`	`broad`, `focused`, `comparative`, `exploratory`
How to structure the LLM response	`action_hints`	`should_enumerate`, `should_compare`, `should_direct_answer`
Whether to retrieve / reason / check freshness	`routing_hints`	`needs_retrieval`, `needs_current_info`, `needs_reasoning`
Whether the query mentions time	`semantic_hints.mentions_time`	`True` / `False`
Whether the query explicitly asks for current/fresh info	`semantic_hints.explicit_recency`	`True` / `False`
Whether the query requires cross-period analysis	`semantic_hints.requires_multi_period_analysis`	`True` / `False`
Whether the user wants a specific output format	`routing_hints['needs_structure']`	`True` / `False`
Whether the query continues a prior conversation	`is_continuation`	`True` / `False`
How confident the classifier is	`confidence` + `confidence_band`	`0.74`, `high`

Common patterns:

Simple LLM router — branch on primary_query_type alone. Done.
RAG pipeline — use routing_hints['needs_retrieval'] to decide whether to retrieve. Use routing_hints['needs_current_info'] (or semantic_hints.explicit_recency) to trigger freshness checks/web search, and scope to decide how many results to fetch (broad → more, focused → fewer).
Response generator — act on action_hints: should_enumerate → numbered list, should_compare → side-by-side table, should_direct_answer → single concise answer.
Time-aware pipeline — gate temporal aggregation on semantic_hints.requires_multi_period_analysis.
Structured-output pipeline — detect explicit format requests via routing_hints['needs_structure'] before passing to the generator.
Ambiguity guard — check confidence_band == 'low' or ambiguity_score > 0.5 before routing; fall back to clarification when confidence is too low.

The primary_query_type labels are intentionally granular (12 types). If you only need coarse routing, scope already gives you broad / focused / comparative without looking at the type label at all.

Public API

`PromptCueAnalyzer`

PromptCueAnalyzer(config: PromptCueConfig | None = None)

Method	Description
`.analyze(text: str) -> PromptCueQueryObject`	Analyze a query and return a structured result
`.warm_up() -> None`	Pre-load all enabled models at startup to avoid first-query latency
`.analyze_async(text: str) -> PromptCueQueryObject`	Async variant of `.analyze()`; delegates to `asyncio.to_thread()`
`.warm_up_async() -> None`	Async variant of `.warm_up()`; delegates to `asyncio.to_thread()`

`PromptCueConfig` fields

Field	Type	Default	Description
`registry_path`	`Path \| None`	`None`	Custom YAML registry path; uses bundled default when `None`
`model_cache_dir`	`Path \| None`	env / `None`	Directory where the sentence-transformers model is cached. Falls back to `PROMPTCUE_MODEL_CACHE` env var, then HuggingFace default (`~/.cache/huggingface/`)
`embed_fn`	`Callable[[str], list[float]] \| None`	`None`	Injectable embed function for hosted mode. When set, PromptCue delegates all vector computation to this function and never loads a model. `enable_semantic_scoring` is forced to `True`. See Hosted mode
`show_progress_bar`	`bool`	`False`	Standalone mode only: forwarded to `SentenceTransformer.encode(show_progress_bar=...)`. Keep `False` for clean logs; set `True` for local debugging
`similarity_threshold`	`float`	`0.55`	Minimum score for a deterministic match to be accepted
`semantic_similarity_threshold`	`float`	`0.20`	Minimum score for a semantic match to be accepted
`ambiguity_margin`	`float`	`0.08`	Min gap between top-2 scores before clarification is flagged
`semantic_fallback_threshold`	`float`	`0.75`	Deterministic score above which the semantic pass is skipped
`trigger_fallback_threshold`	`float`	`0.60`	When a trigger phrase matched and the score meets this value and the margin is clear, the deterministic result is trusted directly and semantic is skipped
`enable_semantic_scoring`	`bool`	auto	`True` when `sentence-transformers` is installed or `embed_fn` is set, else `False`
`embedding_model`	`str`	`all-MiniLM-L6-v2`	HuggingFace model name for semantic scoring (ignored when `embed_fn` is set)
`enable_language_detection`	`bool`	`False`	Detect BCP-47 language code; requires `promptcue[detection]`
`enable_linguistic_extraction`	`bool`	`False`	Extract verbs, noun phrases, named entities; requires `promptcue[linguistic]`
`enable_keyword_extraction`	`bool`	`False`	Extract keyphrases via KeyBERT; requires `promptcue[keywords]`
`max_keywords`	`int`	`8`	Maximum number of keyphrases to extract
`spacy_model`	`str`	`en_core_web_sm`	spaCy model name for linguistic extraction

`PromptCueQueryObject` fields

Field	Type	Description
`schema_version`	`str`	Output schema version (`"1.0"`)
`input_text`	`str`	Original query as provided by the caller
`normalized_text`	`str`	Unicode-normalised, whitespace-collapsed query
`language`	`str`	BCP-47 language code (`"en"`) or `"unknown"` when detection is off
`is_continuation`	`bool`	`True` when the query continues an ongoing conversation (e.g. "what about X?", "and for Y?")
`primary_query_type`	`str`	Top classified query type label, or `"unknown"`
`classification_basis`	`str`	How the result was reached: `trigger_match`, `word_overlap`, `semantic_similarity`, `below_threshold`
`candidate_query_types`	`list[PromptCueCandidate]`	All types ranked by score
`runner_up`	`PromptCueCandidate \| None`	Second-ranked candidate; `None` when fewer than two candidates exist
`confidence`	`float`	Score of the top candidate (0.0–1.0)
`confidence_band`	`str`	Coarse confidence tier: `high`, `medium`, or `low`
`ambiguity_score`	`float`	How close the top-2 candidates are (0.0 = clear, 1.0 = identical)
`scope`	`str`	Query scope: `broad`, `focused`, `comparative`, `exploratory`, or `unknown`
`main_verbs`	`list[str]`	Root verbs extracted by spaCy (empty when enrichment is off)
`noun_phrases`	`list[str]`	Noun chunks extracted by spaCy (empty when enrichment is off)
`named_entities`	`list[str]`	Named entity surface texts, plain strings (backward compat)
`entities`	`list[PromptCueEntity]`	Named entities with `text` and `entity_type` (spaCy label)
`keywords`	`list[PromptCueKeyword]`	Keyphrases with `text`, `weight`, and `kind` from KeyBERT
`routing_hints`	`dict[str, bool]`	`needs_retrieval`, `needs_reasoning`, `needs_current_info`, `needs_clarification`, `needs_structure`
`semantic_hints`	`PromptCueSemanticHints`	Agnostic semantic cues (`mentions_multiple_items`, `requests_comparison`, `requests_enumeration`, `requests_structure`, `mentions_time`, `requires_multi_period_analysis`)
`confidence_meta`	`PromptCueConfidenceMeta`	Confidence diagnostics (`type_confidence_margin`, `scope_confidence`, `scope_confidence_margin`)
`explanations`	`PromptCueExplanations`	Debug metadata (`decision_notes`, `evidence_tokens`)
`action_hints`	`dict[str, bool]`	Response-generation directives: `should_survey`, `should_enumerate`, `should_compare`, `should_direct_answer`, `should_check_recency`, `should_clarify`, `should_respond_conversationally`
`constraints`	`list[str]`	Reserved for future use

Exceptions

All exceptions inherit from PromptCueError.

Exception	Raised when
`PromptCueError`	Base class — catch this to handle all PromptCue errors
`PromptCueModelLoadError`	The sentence-transformers model cannot be loaded at `warm_up()` time
`PromptCueRegistryError`	The query type registry YAML is missing, malformed, or contains invalid entries

Development

git clone https://github.com/informity/promptcue.git
cd promptcue

python3 -m venv .venv
source .venv/bin/activate

pip install -e ".[dev,semantic,linguistic,keywords,detection]"
python -m spacy download en_core_web_sm

pytest
ruff check src/ tests/ examples/

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

informity

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.8

Apr 23, 2026

0.3.7

Apr 17, 2026

0.3.6

Apr 8, 2026

0.3.5

Apr 7, 2026

0.3.4

Apr 7, 2026

This version

0.3.3

Apr 7, 2026

0.3.2

Mar 28, 2026

0.3.1

Mar 28, 2026

0.3.0

Mar 27, 2026

0.2.1

Mar 25, 2026

0.2.0

Mar 25, 2026

0.1.4

Mar 24, 2026

0.1.2

Mar 24, 2026

0.1.1

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptcue-0.3.3.tar.gz (61.4 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

promptcue-0.3.3-py3-none-any.whl (49.5 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file promptcue-0.3.3.tar.gz.

File metadata

Download URL: promptcue-0.3.3.tar.gz
Upload date: Apr 7, 2026
Size: 61.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptcue-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`fe9ce8691cc86f0acf8eb9514d0eb61e762446025aba91c74ecd3fb497af7e1e`
MD5	`9c99ee2b93a788d93709725b6a1ed099`
BLAKE2b-256	`44d002da017d52c7d5e785e4923d58223ddbd0405159e6bd523b33ca0ce7fc28`

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptcue-0.3.3.tar.gz:

Publisher: publish.yml on informity/promptcue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: promptcue-0.3.3.tar.gz
- Subject digest: fe9ce8691cc86f0acf8eb9514d0eb61e762446025aba91c74ecd3fb497af7e1e
- Sigstore transparency entry: 1245463752
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: informity/promptcue@8465e4bb94546c8d611197b8f773df2725cc9361
- Branch / Tag: refs/tags/v0.3.3
- Owner: https://github.com/informity
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8465e4bb94546c8d611197b8f773df2725cc9361
- Trigger Event: push

File details

Details for the file promptcue-0.3.3-py3-none-any.whl.

File metadata

Download URL: promptcue-0.3.3-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 49.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptcue-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0c6e40a39704af62a4a503e9962d9a9b3e72f222335fab1530612639a4ca1695`
MD5	`8f08d08e3e9efed868ba0f73a0f20c48`
BLAKE2b-256	`3af58a0b2523d37a4af7c4a7fd62023355b8e7da94ee04f73fbab19841acac8b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptcue-0.3.3-py3-none-any.whl:

Publisher: publish.yml on informity/promptcue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: promptcue-0.3.3-py3-none-any.whl
- Subject digest: 0c6e40a39704af62a4a503e9962d9a9b3e72f222335fab1530612639a4ca1695
- Sigstore transparency entry: 1245463757
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: informity/promptcue@8465e4bb94546c8d611197b8f773df2725cc9361
- Branch / Tag: refs/tags/v0.3.3
- Owner: https://github.com/informity
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8465e4bb94546c8d611197b8f773df2725cc9361
- Trigger Event: push

promptcue 0.3.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

PromptCue — Prompt Intent Classifier for LLM Pipelines

How it works

Requirements

Install

Production deployment

Model cache location

Hosted mode: reusing an existing embedding model

Deployment patterns

Quick start

Basic — no ML dependencies required

With semantic scoring — requires pip install "promptcue[semantic]"

With full enrichment

In an async application

With an injected embed function (hosted mode)

Full JSON output

Query types

Which field should I use?

Public API

PromptCueAnalyzer

PromptCueConfig fields

PromptCueQueryObject fields

Exceptions

Development

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

With semantic scoring — requires `pip install "promptcue[semantic]"`

`PromptCueAnalyzer`

`PromptCueConfig` fields

`PromptCueQueryObject` fields