RAG-Enhanced Intent Classification (REIC) — semantic routing + LLMs

These details have not been verified by PyPI

Project links

Project description

title: Intent Classifier REIC emoji: 🎯 colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false

Intent Classifier — REIC Implementation

A production-grade, configurable intent classification library implementing learnings from the REIC: RAG-Enhanced Intent Classification at Scale paper (Amazon, 2024).

Combines Semantic Routing (SentenceTransformers), Retrieval-Augmented Generation (RAG) for few-shot evidence, and Hierarchical Routing (coarse category → fine intent) with full multi-turn conversation support.

Architecture

User Query
    │
    ▼
SemanticRouter ──── encode_query() ────► shared embedding
    │                                           │
    ▼                                           ▼
[FLAT]  Top-K intents              ExampleStore.retrieve()
    │   (flat cosine sim)          (context-enriched RAG query)
    │                                           │
[HIERARCHICAL_RAG]                             │
    │                                           │
    ▼                                           ▼
HierarchicalRouter                     Few-shot examples
  Coarse: category embeddings          (filtered by category)
  Fine:   intent embeddings                     │
  Prior category pinning  ◄────────────────────┘
    │
    ▼
LLM (Ollama / OpenAI-compatible)
  + conversation history (MULTI mode)
  + retrieved examples as evidence
  + optional verification pass
    │
    ▼
(intent_name, confidence, language)

Features

Feature	Description
Three classification modes	`FLAT`, `FLAT_RAG`, `HIERARCHICAL_RAG` — configurable at init
Two turn modes	`SINGLE` (stateless) and `MULTI` (conversation-aware)
RAG retrieval	ExampleStore with cosine similarity; context-enriched query for follow-ups
Hierarchical routing	Two-level coarse→fine routing reduces cross-category noise at scale
Prior category pinning	Follow-ups stay in the correct domain even with zero semantic signal
Automated bootstrapping	LLM generates single-turn and multi-turn examples — no manual labeling needed
Shared query embedding	Encoded once, reused for both routing and RAG (no double encoding)
Confidence gate	Low retrieval similarity caps LLM confidence to signal uncharted territory
Lazy heavy imports	`torch`, `transformers`, `ollama` only imported when actually used
123 offline unit tests	Full test suite runs without GPU, LLM, or model downloads

Installation

pip install -r requirements.txt

Or install as a package:

pip install .

Quick Start

Simplest setup — FLAT mode (no RAG, no hierarchy)

import asyncio
from query_classifier import IntentClassifier, ClassificationMode, TurnMode

INTENTS = [
    {"name": "check_balance",  "description": "User wants to check their account balance."},
    {"name": "transfer_money", "description": "User wants to transfer money to another account."},
    {"name": "block_card",     "description": "User wants to block a lost or stolen card."},
]

async def main():
    nlp = IntentClassifier(
        intents=INTENTS,
        mode=ClassificationMode.FLAT,
        turn_mode=TurnMode.SINGLE,
        llm_model_name="llama3",
        llm_base_url="http://localhost:11434",
    )
    intent, confidence, language = await nlp.classify("how much money do I have?")
    print(f"Intent: {intent}  Confidence: {confidence:.2f}")

asyncio.run(main())

FLAT_RAG — add few-shot evidence from labeled examples

from query_classifier import IntentClassifier, ClassificationMode, ExampleStore

store = ExampleStore()
store.add_examples_bulk([
    {"text": "what is my balance",   "intent": "check_balance"},
    {"text": "check my account",     "intent": "check_balance"},
    {"text": "transfer to savings",  "intent": "transfer_money"},
])

nlp = IntentClassifier(
    intents=INTENTS,
    mode=ClassificationMode.FLAT_RAG,
    example_store=store,
)
intent, conf, lang = await nlp.classify("how much is in my account")

HIERARCHICAL_RAG — full REIC pipeline with hierarchy

from query_classifier import IntentClassifier, ClassificationMode, TurnMode

HIERARCHY = {
    "accounts": {
        "description": "Managing bank accounts: balance, statements.",
        "intents": ["check_balance", "bank_statement"],
    },
    "cards": {
        "description": "Card management: block, unblock.",
        "intents": ["block_card"],
    },
}

nlp = IntentClassifier(
    intents=INTENTS,
    mode=ClassificationMode.HIERARCHICAL_RAG,
    turn_mode=TurnMode.MULTI,
    example_store=store,
    intent_hierarchy=HIERARCHY,
)

Multi-turn conversation

history = []

# Turn 1
intent, conf, _ = await nlp.classify("I need my bank statement", conversation_history=history)
history += [
    {"role": "user",      "content": "I need my bank statement"},
    {"role": "assistant", "content": "Sure.", "intent_classified": intent},
]

# Turn 2 — bare follow-up: RAG query is enriched with history automatically
intent, conf, _ = await nlp.classify("for last 6 months", conversation_history=history)
# → bank_statement  (not lost despite zero semantic signal in the bare phrase)

Classification Modes

Mode	Pipeline	When to use
`FLAT`	SemanticRouter → LLM	No labeled examples; fastest setup
`FLAT_RAG`	SemanticRouter → ExampleStore → LLM	Have examples; small/flat intent set
`HIERARCHICAL_RAG`	HierarchicalRouter → ExampleStore (category-filtered) → LLM	Have examples AND a hierarchy; best accuracy at scale

# All three modes use the same classify() signature
intent, confidence, language = await nlp.classify(
    text,
    conversation_history=history,  # ignored in SINGLE mode
    verify=False,                   # optional second LLM verification pass
)

Turn Modes

Mode	Behaviour
`SINGLE`	Each call is independent. History is ignored. RAG uses raw query only.
`MULTI`	History enriches routing (prior category pinning) and RAG retrieval (sliding window context).

# Library-level configuration
nlp = IntentClassifier(intents=..., turn_mode=TurnMode.MULTI)

# Or via environment variable
# TURN_MODE=multi

ExampleStore

Stores labeled utterances with optional history_context for multi-turn examples.

from query_classifier import ExampleStore

store = ExampleStore()

# Single-turn
store.add_example("what is my balance", "check_balance")

# Multi-turn: embedding = encode("I need my bank statement for last 6 months")
store.add_example(
    text="for last 6 months",
    intent_name="bank_statement",
    history_context="I need my bank statement",
)

# Bulk load from list of dicts
store.add_examples_bulk(examples)

# Retrieve top-k by similarity
results = store.retrieve("account balance", k=5)
# [{"text": ..., "intent": ..., "score": ...}, ...]

# Persistence
store.save("my_store")    # writes my_store.json + my_store.npy
store.load("my_store")    # loads both; recomputes embeddings if .npy missing

# Coverage stats
store.intent_coverage()       # {"check_balance": 4, "block_card": 3, ...}
store.multi_turn_coverage()   # intents that have follow-up examples

Automated Bootstrapping

Generate a fully labeled ExampleStore using the LLM itself — no manual labeling needed.

import asyncio
from query_classifier import ExampleStore, ExampleStoreBootstrapper

store = ExampleStore()

bootstrapper = ExampleStoreBootstrapper(
    llm_model_name="llama3",
    llm_base_url="http://localhost:11434",
    concurrency=3,   # parallel LLM calls
    max_retries=2,
)

summary = await bootstrapper.run_full_bootstrap(
    store=store,
    intents=INTENTS,
    save_path="my_store",   # saved to my_store.json + my_store.npy
    n_single=8,             # utterances per intent (single-turn)
    n_multi=6,              # follow-up examples per intent (multi-turn)
)
# {"total": 140, "single_turn_generated": 80, "multi_turn_generated": 60, ...}

Bootstrap runs once. On subsequent startups, load from disk:

store = ExampleStore()
store.load("my_store")

Configuration

All settings can be overridden via environment variables:

Variable	Default	Description
`LLM_PROVIDER`	`ollama`	LLM backend
`LLM_API_BASE`	`http://localhost:11434`	LLM API base URL
`LLM_MODEL_NAME`	`llama3`	Model name
`LLM_API_KEY`	(empty)	Bearer token for authenticated endpoints
`ROUTER_EMBEDDING_MODEL`	`all-MiniLM-L6-v2`	SentenceTransformer model
`RAG_TOP_K_EXAMPLES`	`6`	Examples retrieved per query
`EXAMPLE_STORE_PATH`	(empty)	Auto-load store from this path on init
`TURN_MODE`	`single`	Default turn mode (`single` / `multi`)
`BOOTSTRAP_N_SINGLE`	`8`	Single-turn examples per intent during bootstrap
`BOOTSTRAP_N_MULTI`	`6`	Multi-turn examples per intent during bootstrap
`BOOTSTRAP_CONCURRENCY`	`3`	Max parallel LLM calls during bootstrap

Project Structure

query_classifier/
├── nlp_engine.py        # IntentClassifier — main entry point
├── example_store.py     # ExampleStore — labeled utterance store with RAG retrieval
├── semantic_router.py   # SemanticRouter — embedding-based intent routing
├── hierarchy.py         # HierarchicalRouter — two-level coarse→fine routing
├── bootstrapper.py      # ExampleStoreBootstrapper — automated example generation
├── config.py            # All configurable settings (env var backed)
└── __init__.py          # Public API exports

examples/
├── basic_single_turn.py       # FLAT vs FLAT_RAG side-by-side
├── multi_turn_conversation.py # Three full multi-turn conversations
├── custom_intents.py          # Plug-in your own domain (e-commerce)
├── bootstrap_store.py         # Full bootstrap → save → load → classify
├── reic_demo.py               # Complete REIC demo with all three modes
└── banking_intents.py         # Banking intent + hierarchy definitions

tests/
├── conftest.py                # MockEncoder (offline, deterministic), fixtures
├── test_example_store.py      # Population, retrieval, persistence, coverage
├── test_semantic_router.py    # encode_query, find_top_k, fallback paths
├── test_hierarchy.py          # Init, routing, prior category pinning
├── test_nlp_engine.py         # Enums, validation, classify() all modes, mocked LLM
└── test_bootstrapper.py       # parse_json, generate, bootstrap, full run

Running Tests

pytest tests/ -v

All 123 tests run fully offline — no Ollama, no GPU, no model downloads required.

Running Examples

# Requires a running Ollama instance (ollama serve)

python examples/basic_single_turn.py
python examples/multi_turn_conversation.py
python examples/custom_intents.py
python examples/bootstrap_store.py
python examples/reic_demo.py

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 19, 2026

0.1.2

Feb 1, 2026

0.1.1

Feb 1, 2026

0.1.0

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

query_classifier-0.2.0.tar.gz (35.8 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

query_classifier-0.2.0-py3-none-any.whl (37.6 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file query_classifier-0.2.0.tar.gz.

File metadata

Download URL: query_classifier-0.2.0.tar.gz
Upload date: Apr 19, 2026
Size: 35.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for query_classifier-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`797b868090299b5e3dafd4f1d5efc3c6f07066ed9c49d4613f956339d9643256`
MD5	`724679dbca079e0703e419bb1a5d0c64`
BLAKE2b-256	`08ab5bce2986a26b591eb7b6f80ef4ea38bbf26f2a63e5b02d60f7c3ebd64cae`

See more details on using hashes here.

File details

Details for the file query_classifier-0.2.0-py3-none-any.whl.

File metadata

Download URL: query_classifier-0.2.0-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 37.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for query_classifier-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`153f3018ee38a15ed869ceffee80da7b1bfc4dd83b307871ba81cd75c4e60335`
MD5	`e3444736e1f413fe9612064b40787085`
BLAKE2b-256	`79b30ae941fa7d9066f9cfb86e8771b61b2709ce544f349c83a7d7f49aad7483`

See more details on using hashes here.

query-classifier 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

title: Intent Classifier REIC emoji: 🎯 colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false

Intent Classifier — REIC Implementation

Architecture

Features

Installation

Quick Start

Simplest setup — FLAT mode (no RAG, no hierarchy)

FLAT_RAG — add few-shot evidence from labeled examples

HIERARCHICAL_RAG — full REIC pipeline with hierarchy

Multi-turn conversation

Classification Modes

Turn Modes

ExampleStore

Automated Bootstrapping

Configuration

Project Structure

Running Tests

Running Examples

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes