RAG-Enhanced Intent Classification (REIC) โ semantic routing + LLMs
Project description
title: Intent Classifier REIC emoji: ๐ฏ colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false
Intent Classifier โ REIC Implementation
A production-grade, configurable intent classification library implementing learnings from the REIC: RAG-Enhanced Intent Classification at Scale paper (Amazon, 2024).
Combines Semantic Routing (SentenceTransformers), Retrieval-Augmented Generation (RAG) for few-shot evidence, and Hierarchical Routing (coarse category โ fine intent) with full multi-turn conversation support.
Architecture
User Query
โ
โผ
SemanticRouter โโโโ encode_query() โโโโโบ shared embedding
โ โ
โผ โผ
[FLAT] Top-K intents ExampleStore.retrieve()
โ (flat cosine sim) (context-enriched RAG query)
โ โ
[HIERARCHICAL_RAG] โ
โ โ
โผ โผ
HierarchicalRouter Few-shot examples
Coarse: category embeddings (filtered by category)
Fine: intent embeddings โ
Prior category pinning โโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
LLM (Ollama / OpenAI-compatible)
+ conversation history (MULTI mode)
+ retrieved examples as evidence
+ optional verification pass
โ
โผ
(intent_name, confidence, language)
Features
| Feature | Description |
|---|---|
| Three classification modes | FLAT, FLAT_RAG, HIERARCHICAL_RAG โ configurable at init |
| Two turn modes | SINGLE (stateless) and MULTI (conversation-aware) |
| RAG retrieval | ExampleStore with cosine similarity; context-enriched query for follow-ups |
| Hierarchical routing | Two-level coarseโfine routing reduces cross-category noise at scale |
| Prior category pinning | Follow-ups stay in the correct domain even with zero semantic signal |
| Automated bootstrapping | LLM generates single-turn and multi-turn examples โ no manual labeling needed |
| Shared query embedding | Encoded once, reused for both routing and RAG (no double encoding) |
| Confidence gate | Low retrieval similarity caps LLM confidence to signal uncharted territory |
| Lazy heavy imports | torch, transformers, ollama only imported when actually used |
| 123 offline unit tests | Full test suite runs without GPU, LLM, or model downloads |
Installation
pip install -r requirements.txt
Or install as a package:
pip install .
Quick Start
Simplest setup โ FLAT mode (no RAG, no hierarchy)
import asyncio
from query_classifier import IntentClassifier, ClassificationMode, TurnMode
INTENTS = [
{"name": "check_balance", "description": "User wants to check their account balance."},
{"name": "transfer_money", "description": "User wants to transfer money to another account."},
{"name": "block_card", "description": "User wants to block a lost or stolen card."},
]
async def main():
nlp = IntentClassifier(
intents=INTENTS,
mode=ClassificationMode.FLAT,
turn_mode=TurnMode.SINGLE,
llm_model_name="llama3",
llm_base_url="http://localhost:11434",
)
intent, confidence, language = await nlp.classify("how much money do I have?")
print(f"Intent: {intent} Confidence: {confidence:.2f}")
asyncio.run(main())
FLAT_RAG โ add few-shot evidence from labeled examples
from query_classifier import IntentClassifier, ClassificationMode, ExampleStore
store = ExampleStore()
store.add_examples_bulk([
{"text": "what is my balance", "intent": "check_balance"},
{"text": "check my account", "intent": "check_balance"},
{"text": "transfer to savings", "intent": "transfer_money"},
])
nlp = IntentClassifier(
intents=INTENTS,
mode=ClassificationMode.FLAT_RAG,
example_store=store,
)
intent, conf, lang = await nlp.classify("how much is in my account")
HIERARCHICAL_RAG โ full REIC pipeline with hierarchy
from query_classifier import IntentClassifier, ClassificationMode, TurnMode
HIERARCHY = {
"accounts": {
"description": "Managing bank accounts: balance, statements.",
"intents": ["check_balance", "bank_statement"],
},
"cards": {
"description": "Card management: block, unblock.",
"intents": ["block_card"],
},
}
nlp = IntentClassifier(
intents=INTENTS,
mode=ClassificationMode.HIERARCHICAL_RAG,
turn_mode=TurnMode.MULTI,
example_store=store,
intent_hierarchy=HIERARCHY,
)
Multi-turn conversation
history = []
# Turn 1
intent, conf, _ = await nlp.classify("I need my bank statement", conversation_history=history)
history += [
{"role": "user", "content": "I need my bank statement"},
{"role": "assistant", "content": "Sure.", "intent_classified": intent},
]
# Turn 2 โ bare follow-up: RAG query is enriched with history automatically
intent, conf, _ = await nlp.classify("for last 6 months", conversation_history=history)
# โ bank_statement (not lost despite zero semantic signal in the bare phrase)
Classification Modes
| Mode | Pipeline | When to use |
|---|---|---|
FLAT |
SemanticRouter โ LLM | No labeled examples; fastest setup |
FLAT_RAG |
SemanticRouter โ ExampleStore โ LLM | Have examples; small/flat intent set |
HIERARCHICAL_RAG |
HierarchicalRouter โ ExampleStore (category-filtered) โ LLM | Have examples AND a hierarchy; best accuracy at scale |
# All three modes use the same classify() signature
intent, confidence, language = await nlp.classify(
text,
conversation_history=history, # ignored in SINGLE mode
verify=False, # optional second LLM verification pass
)
Turn Modes
| Mode | Behaviour |
|---|---|
SINGLE |
Each call is independent. History is ignored. RAG uses raw query only. |
MULTI |
History enriches routing (prior category pinning) and RAG retrieval (sliding window context). |
# Library-level configuration
nlp = IntentClassifier(intents=..., turn_mode=TurnMode.MULTI)
# Or via environment variable
# TURN_MODE=multi
ExampleStore
Stores labeled utterances with optional history_context for multi-turn examples.
from query_classifier import ExampleStore
store = ExampleStore()
# Single-turn
store.add_example("what is my balance", "check_balance")
# Multi-turn: embedding = encode("I need my bank statement for last 6 months")
store.add_example(
text="for last 6 months",
intent_name="bank_statement",
history_context="I need my bank statement",
)
# Bulk load from list of dicts
store.add_examples_bulk(examples)
# Retrieve top-k by similarity
results = store.retrieve("account balance", k=5)
# [{"text": ..., "intent": ..., "score": ...}, ...]
# Persistence
store.save("my_store") # writes my_store.json + my_store.npy
store.load("my_store") # loads both; recomputes embeddings if .npy missing
# Coverage stats
store.intent_coverage() # {"check_balance": 4, "block_card": 3, ...}
store.multi_turn_coverage() # intents that have follow-up examples
Automated Bootstrapping
Generate a fully labeled ExampleStore using the LLM itself โ no manual labeling needed.
import asyncio
from query_classifier import ExampleStore, ExampleStoreBootstrapper
store = ExampleStore()
bootstrapper = ExampleStoreBootstrapper(
llm_model_name="llama3",
llm_base_url="http://localhost:11434",
concurrency=3, # parallel LLM calls
max_retries=2,
)
summary = await bootstrapper.run_full_bootstrap(
store=store,
intents=INTENTS,
save_path="my_store", # saved to my_store.json + my_store.npy
n_single=8, # utterances per intent (single-turn)
n_multi=6, # follow-up examples per intent (multi-turn)
)
# {"total": 140, "single_turn_generated": 80, "multi_turn_generated": 60, ...}
Bootstrap runs once. On subsequent startups, load from disk:
store = ExampleStore()
store.load("my_store")
Configuration
All settings can be overridden via environment variables:
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
ollama |
LLM backend |
LLM_API_BASE |
http://localhost:11434 |
LLM API base URL |
LLM_MODEL_NAME |
llama3 |
Model name |
LLM_API_KEY |
(empty) | Bearer token for authenticated endpoints |
ROUTER_EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
SentenceTransformer model |
RAG_TOP_K_EXAMPLES |
6 |
Examples retrieved per query |
EXAMPLE_STORE_PATH |
(empty) | Auto-load store from this path on init |
TURN_MODE |
single |
Default turn mode (single / multi) |
BOOTSTRAP_N_SINGLE |
8 |
Single-turn examples per intent during bootstrap |
BOOTSTRAP_N_MULTI |
6 |
Multi-turn examples per intent during bootstrap |
BOOTSTRAP_CONCURRENCY |
3 |
Max parallel LLM calls during bootstrap |
Project Structure
query_classifier/
โโโ nlp_engine.py # IntentClassifier โ main entry point
โโโ example_store.py # ExampleStore โ labeled utterance store with RAG retrieval
โโโ semantic_router.py # SemanticRouter โ embedding-based intent routing
โโโ hierarchy.py # HierarchicalRouter โ two-level coarseโfine routing
โโโ bootstrapper.py # ExampleStoreBootstrapper โ automated example generation
โโโ config.py # All configurable settings (env var backed)
โโโ __init__.py # Public API exports
examples/
โโโ basic_single_turn.py # FLAT vs FLAT_RAG side-by-side
โโโ multi_turn_conversation.py # Three full multi-turn conversations
โโโ custom_intents.py # Plug-in your own domain (e-commerce)
โโโ bootstrap_store.py # Full bootstrap โ save โ load โ classify
โโโ reic_demo.py # Complete REIC demo with all three modes
โโโ banking_intents.py # Banking intent + hierarchy definitions
tests/
โโโ conftest.py # MockEncoder (offline, deterministic), fixtures
โโโ test_example_store.py # Population, retrieval, persistence, coverage
โโโ test_semantic_router.py # encode_query, find_top_k, fallback paths
โโโ test_hierarchy.py # Init, routing, prior category pinning
โโโ test_nlp_engine.py # Enums, validation, classify() all modes, mocked LLM
โโโ test_bootstrapper.py # parse_json, generate, bootstrap, full run
Running Tests
pytest tests/ -v
All 123 tests run fully offline โ no Ollama, no GPU, no model downloads required.
Running Examples
# Requires a running Ollama instance (ollama serve)
python examples/basic_single_turn.py
python examples/multi_turn_conversation.py
python examples/custom_intents.py
python examples/bootstrap_store.py
python examples/reic_demo.py
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file query_classifier-0.2.0.tar.gz.
File metadata
- Download URL: query_classifier-0.2.0.tar.gz
- Upload date:
- Size: 35.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
797b868090299b5e3dafd4f1d5efc3c6f07066ed9c49d4613f956339d9643256
|
|
| MD5 |
724679dbca079e0703e419bb1a5d0c64
|
|
| BLAKE2b-256 |
08ab5bce2986a26b591eb7b6f80ef4ea38bbf26f2a63e5b02d60f7c3ebd64cae
|
File details
Details for the file query_classifier-0.2.0-py3-none-any.whl.
File metadata
- Download URL: query_classifier-0.2.0-py3-none-any.whl
- Upload date:
- Size: 37.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
153f3018ee38a15ed869ceffee80da7b1bfc4dd83b307871ba81cd75c4e60335
|
|
| MD5 |
e3444736e1f413fe9612064b40787085
|
|
| BLAKE2b-256 |
79b30ae941fa7d9066f9cfb86e8771b61b2709ce544f349c83a7d7f49aad7483
|