Skip to main content

RAG-Enhanced Intent Classification (REIC) โ€” semantic routing + LLMs

Project description


title: Intent Classifier REIC emoji: ๐ŸŽฏ colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false

Intent Classifier โ€” REIC Implementation

A production-grade, configurable intent classification library implementing learnings from the REIC: RAG-Enhanced Intent Classification at Scale paper (Amazon, 2024).

Combines Semantic Routing (SentenceTransformers), Retrieval-Augmented Generation (RAG) for few-shot evidence, and Hierarchical Routing (coarse category โ†’ fine intent) with full multi-turn conversation support.


Architecture

User Query
    โ”‚
    โ–ผ
SemanticRouter โ”€โ”€โ”€โ”€ encode_query() โ”€โ”€โ”€โ”€โ–บ shared embedding
    โ”‚                                           โ”‚
    โ–ผ                                           โ–ผ
[FLAT]  Top-K intents              ExampleStore.retrieve()
    โ”‚   (flat cosine sim)          (context-enriched RAG query)
    โ”‚                                           โ”‚
[HIERARCHICAL_RAG]                             โ”‚
    โ”‚                                           โ”‚
    โ–ผ                                           โ–ผ
HierarchicalRouter                     Few-shot examples
  Coarse: category embeddings          (filtered by category)
  Fine:   intent embeddings                     โ”‚
  Prior category pinning  โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚
    โ–ผ
LLM (Ollama / OpenAI-compatible)
  + conversation history (MULTI mode)
  + retrieved examples as evidence
  + optional verification pass
    โ”‚
    โ–ผ
(intent_name, confidence, language)

Features

Feature Description
Three classification modes FLAT, FLAT_RAG, HIERARCHICAL_RAG โ€” configurable at init
Two turn modes SINGLE (stateless) and MULTI (conversation-aware)
RAG retrieval ExampleStore with cosine similarity; context-enriched query for follow-ups
Hierarchical routing Two-level coarseโ†’fine routing reduces cross-category noise at scale
Prior category pinning Follow-ups stay in the correct domain even with zero semantic signal
Automated bootstrapping LLM generates single-turn and multi-turn examples โ€” no manual labeling needed
Shared query embedding Encoded once, reused for both routing and RAG (no double encoding)
Confidence gate Low retrieval similarity caps LLM confidence to signal uncharted territory
Lazy heavy imports torch, transformers, ollama only imported when actually used
123 offline unit tests Full test suite runs without GPU, LLM, or model downloads

Installation

pip install -r requirements.txt

Or install as a package:

pip install .

Quick Start

Simplest setup โ€” FLAT mode (no RAG, no hierarchy)

import asyncio
from query_classifier import IntentClassifier, ClassificationMode, TurnMode

INTENTS = [
    {"name": "check_balance",  "description": "User wants to check their account balance."},
    {"name": "transfer_money", "description": "User wants to transfer money to another account."},
    {"name": "block_card",     "description": "User wants to block a lost or stolen card."},
]

async def main():
    nlp = IntentClassifier(
        intents=INTENTS,
        mode=ClassificationMode.FLAT,
        turn_mode=TurnMode.SINGLE,
        llm_model_name="llama3",
        llm_base_url="http://localhost:11434",
    )
    intent, confidence, language = await nlp.classify("how much money do I have?")
    print(f"Intent: {intent}  Confidence: {confidence:.2f}")

asyncio.run(main())

FLAT_RAG โ€” add few-shot evidence from labeled examples

from query_classifier import IntentClassifier, ClassificationMode, ExampleStore

store = ExampleStore()
store.add_examples_bulk([
    {"text": "what is my balance",   "intent": "check_balance"},
    {"text": "check my account",     "intent": "check_balance"},
    {"text": "transfer to savings",  "intent": "transfer_money"},
])

nlp = IntentClassifier(
    intents=INTENTS,
    mode=ClassificationMode.FLAT_RAG,
    example_store=store,
)
intent, conf, lang = await nlp.classify("how much is in my account")

HIERARCHICAL_RAG โ€” full REIC pipeline with hierarchy

from query_classifier import IntentClassifier, ClassificationMode, TurnMode

HIERARCHY = {
    "accounts": {
        "description": "Managing bank accounts: balance, statements.",
        "intents": ["check_balance", "bank_statement"],
    },
    "cards": {
        "description": "Card management: block, unblock.",
        "intents": ["block_card"],
    },
}

nlp = IntentClassifier(
    intents=INTENTS,
    mode=ClassificationMode.HIERARCHICAL_RAG,
    turn_mode=TurnMode.MULTI,
    example_store=store,
    intent_hierarchy=HIERARCHY,
)

Multi-turn conversation

history = []

# Turn 1
intent, conf, _ = await nlp.classify("I need my bank statement", conversation_history=history)
history += [
    {"role": "user",      "content": "I need my bank statement"},
    {"role": "assistant", "content": "Sure.", "intent_classified": intent},
]

# Turn 2 โ€” bare follow-up: RAG query is enriched with history automatically
intent, conf, _ = await nlp.classify("for last 6 months", conversation_history=history)
# โ†’ bank_statement  (not lost despite zero semantic signal in the bare phrase)

Classification Modes

Mode Pipeline When to use
FLAT SemanticRouter โ†’ LLM No labeled examples; fastest setup
FLAT_RAG SemanticRouter โ†’ ExampleStore โ†’ LLM Have examples; small/flat intent set
HIERARCHICAL_RAG HierarchicalRouter โ†’ ExampleStore (category-filtered) โ†’ LLM Have examples AND a hierarchy; best accuracy at scale
# All three modes use the same classify() signature
intent, confidence, language = await nlp.classify(
    text,
    conversation_history=history,  # ignored in SINGLE mode
    verify=False,                   # optional second LLM verification pass
)

Turn Modes

Mode Behaviour
SINGLE Each call is independent. History is ignored. RAG uses raw query only.
MULTI History enriches routing (prior category pinning) and RAG retrieval (sliding window context).
# Library-level configuration
nlp = IntentClassifier(intents=..., turn_mode=TurnMode.MULTI)

# Or via environment variable
# TURN_MODE=multi

ExampleStore

Stores labeled utterances with optional history_context for multi-turn examples.

from query_classifier import ExampleStore

store = ExampleStore()

# Single-turn
store.add_example("what is my balance", "check_balance")

# Multi-turn: embedding = encode("I need my bank statement for last 6 months")
store.add_example(
    text="for last 6 months",
    intent_name="bank_statement",
    history_context="I need my bank statement",
)

# Bulk load from list of dicts
store.add_examples_bulk(examples)

# Retrieve top-k by similarity
results = store.retrieve("account balance", k=5)
# [{"text": ..., "intent": ..., "score": ...}, ...]

# Persistence
store.save("my_store")    # writes my_store.json + my_store.npy
store.load("my_store")    # loads both; recomputes embeddings if .npy missing

# Coverage stats
store.intent_coverage()       # {"check_balance": 4, "block_card": 3, ...}
store.multi_turn_coverage()   # intents that have follow-up examples

Automated Bootstrapping

Generate a fully labeled ExampleStore using the LLM itself โ€” no manual labeling needed.

import asyncio
from query_classifier import ExampleStore, ExampleStoreBootstrapper

store = ExampleStore()

bootstrapper = ExampleStoreBootstrapper(
    llm_model_name="llama3",
    llm_base_url="http://localhost:11434",
    concurrency=3,   # parallel LLM calls
    max_retries=2,
)

summary = await bootstrapper.run_full_bootstrap(
    store=store,
    intents=INTENTS,
    save_path="my_store",   # saved to my_store.json + my_store.npy
    n_single=8,             # utterances per intent (single-turn)
    n_multi=6,              # follow-up examples per intent (multi-turn)
)
# {"total": 140, "single_turn_generated": 80, "multi_turn_generated": 60, ...}

Bootstrap runs once. On subsequent startups, load from disk:

store = ExampleStore()
store.load("my_store")

Configuration

All settings can be overridden via environment variables:

Variable Default Description
LLM_PROVIDER ollama LLM backend
LLM_API_BASE http://localhost:11434 LLM API base URL
LLM_MODEL_NAME llama3 Model name
LLM_API_KEY (empty) Bearer token for authenticated endpoints
ROUTER_EMBEDDING_MODEL all-MiniLM-L6-v2 SentenceTransformer model
RAG_TOP_K_EXAMPLES 6 Examples retrieved per query
EXAMPLE_STORE_PATH (empty) Auto-load store from this path on init
TURN_MODE single Default turn mode (single / multi)
BOOTSTRAP_N_SINGLE 8 Single-turn examples per intent during bootstrap
BOOTSTRAP_N_MULTI 6 Multi-turn examples per intent during bootstrap
BOOTSTRAP_CONCURRENCY 3 Max parallel LLM calls during bootstrap

Project Structure

query_classifier/
โ”œโ”€โ”€ nlp_engine.py        # IntentClassifier โ€” main entry point
โ”œโ”€โ”€ example_store.py     # ExampleStore โ€” labeled utterance store with RAG retrieval
โ”œโ”€โ”€ semantic_router.py   # SemanticRouter โ€” embedding-based intent routing
โ”œโ”€โ”€ hierarchy.py         # HierarchicalRouter โ€” two-level coarseโ†’fine routing
โ”œโ”€โ”€ bootstrapper.py      # ExampleStoreBootstrapper โ€” automated example generation
โ”œโ”€โ”€ config.py            # All configurable settings (env var backed)
โ””โ”€โ”€ __init__.py          # Public API exports

examples/
โ”œโ”€โ”€ basic_single_turn.py       # FLAT vs FLAT_RAG side-by-side
โ”œโ”€โ”€ multi_turn_conversation.py # Three full multi-turn conversations
โ”œโ”€โ”€ custom_intents.py          # Plug-in your own domain (e-commerce)
โ”œโ”€โ”€ bootstrap_store.py         # Full bootstrap โ†’ save โ†’ load โ†’ classify
โ”œโ”€โ”€ reic_demo.py               # Complete REIC demo with all three modes
โ””โ”€โ”€ banking_intents.py         # Banking intent + hierarchy definitions

tests/
โ”œโ”€โ”€ conftest.py                # MockEncoder (offline, deterministic), fixtures
โ”œโ”€โ”€ test_example_store.py      # Population, retrieval, persistence, coverage
โ”œโ”€โ”€ test_semantic_router.py    # encode_query, find_top_k, fallback paths
โ”œโ”€โ”€ test_hierarchy.py          # Init, routing, prior category pinning
โ”œโ”€โ”€ test_nlp_engine.py         # Enums, validation, classify() all modes, mocked LLM
โ””โ”€โ”€ test_bootstrapper.py       # parse_json, generate, bootstrap, full run

Running Tests

pytest tests/ -v

All 123 tests run fully offline โ€” no Ollama, no GPU, no model downloads required.


Running Examples

# Requires a running Ollama instance (ollama serve)

python examples/basic_single_turn.py
python examples/multi_turn_conversation.py
python examples/custom_intents.py
python examples/bootstrap_store.py
python examples/reic_demo.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

query_classifier-0.2.0.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

query_classifier-0.2.0-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file query_classifier-0.2.0.tar.gz.

File metadata

  • Download URL: query_classifier-0.2.0.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for query_classifier-0.2.0.tar.gz
Algorithm Hash digest
SHA256 797b868090299b5e3dafd4f1d5efc3c6f07066ed9c49d4613f956339d9643256
MD5 724679dbca079e0703e419bb1a5d0c64
BLAKE2b-256 08ab5bce2986a26b591eb7b6f80ef4ea38bbf26f2a63e5b02d60f7c3ebd64cae

See more details on using hashes here.

File details

Details for the file query_classifier-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for query_classifier-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 153f3018ee38a15ed869ceffee80da7b1bfc4dd83b307871ba81cd75c4e60335
MD5 e3444736e1f413fe9612064b40787085
BLAKE2b-256 79b30ae941fa7d9066f9cfb86e8771b61b2709ce544f349c83a7d7f49aad7483

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page