RAG query parsing plugin — parse natural language queries into semantic terms and structured filters using LLMs
Project description
LangCore RAG — Query Parsing for Hybrid Retrieval
A plugin for LangExtract that parses natural-language queries into semantic terms (for vector search) and structured metadata filters (for database / index filtering), enabling hybrid RAG retrieval pipelines. Inspired by LangStruct's .query() method.
Note: This is a third-party plugin for LangExtract. For the main LangExtract library, visit google/langextract.
Installation
Install from source:
git clone <repo-url>
cd langcore-rag
pip install -e .
Or with uv:
uv pip install -e .
Features at a Glance
| Feature | langcore-rag | LangStruct |
|---|---|---|
| Query → semantic terms + filters | ✅ QueryParser.parse() |
✅ .query() |
| Async support | ✅ async_parse() |
✅ |
| Pydantic schema introspection | ✅ Auto-discovers filterable fields | ✅ |
| MongoDB-style operators | ✅ $eq, $gte, $lte, $in, $nin, etc. |
✅ |
| Confidence score | ✅ 0.0 – 1.0 | ❌ |
| Explanation / rationale | ✅ Human-readable | ❌ |
| Any LLM backend | ✅ Via LiteLLM (100+ providers) | ✅ |
| Robust JSON parsing | ✅ Raw JSON + Markdown fences + graceful fallback | ⚠️ |
Quick Start
1. Define a Schema
from pydantic import BaseModel, Field
class Invoice(BaseModel):
amount: float = Field(description="Total invoice amount in USD")
due_date: str = Field(description="Due date in ISO-8601 format")
vendor: str = Field(description="Vendor / supplier name")
paid: bool = Field(description="Whether the invoice is paid")
2. Parse a Query
from langcore_rag import QueryParser
parser = QueryParser(schema=Invoice, model_id="gemini/gemini-2.5-flash")
parsed = parser.parse("invoices over $5000 due in March 2024")
print(parsed.semantic_terms)
# → ["invoices"]
print(parsed.structured_filters)
# → {"amount": {"$gte": 5000}, "due_date": {"$gte": "2024-03-01", "$lte": "2024-03-31"}}
print(parsed.confidence)
# → 0.92
print(parsed.explanation)
# → "Extracted amount ≥ 5000 and date range for March 2024."
3. Async Usage
import asyncio
from langcore_rag import QueryParser
async def main():
parser = QueryParser(schema=Invoice, model_id="gpt-4o")
parsed = await parser.async_parse("unpaid invoices from Acme Corp")
print(parsed.structured_filters)
# → {"paid": {"$eq": false}, "vendor": {"$eq": "Acme Corp"}}
asyncio.run(main())
API Reference
QueryParser
QueryParser(
schema: type[BaseModel],
model_id: str,
*,
temperature: float = 0.0,
max_tokens: int = 1024,
**litellm_kwargs,
)
| Parameter | Type | Description |
|---|---|---|
schema |
type[BaseModel] |
Pydantic model whose fields define filterable metadata |
model_id |
str |
Any LiteLLM-compatible model ID (e.g. "gpt-4o", "gemini/gemini-2.5-flash", "anthropic/claude-3-opus") |
temperature |
float |
Sampling temperature (default 0.0 for deterministic output) |
max_tokens |
int |
Maximum tokens to generate (default 1024) |
**litellm_kwargs |
Extra kwargs forwarded to litellm.completion() (e.g. api_key, api_base, timeout) |
Methods
| Method | Signature | Description |
|---|---|---|
parse |
(query_text: str) -> ParsedQuery |
Synchronous query parsing |
async_parse |
(query_text: str) -> ParsedQuery |
Asynchronous query parsing |
Properties
| Property | Type | Description |
|---|---|---|
schema |
type[BaseModel] |
The Pydantic schema used for field discovery |
model_id |
str |
The LiteLLM model identifier |
system_prompt |
str |
The generated system prompt (useful for debugging) |
ParsedQuery
An immutable (frozen) dataclass returned by parse() / async_parse().
| Field | Type | Description |
|---|---|---|
semantic_terms |
list[str] |
Free-text terms for vector / similarity search |
structured_filters |
dict[str, Any] |
Metadata filters with MongoDB-style operators |
confidence |
float |
0.0 – 1.0 confidence in the parse quality |
explanation |
str |
Human-readable rationale for the decomposition |
How It Works
-
Schema introspection —
QueryParserinspects the Pydantic model's fields to identify which ones are scalar/filterable (int,float,str,bool,date,datetime). Complex types likelist[str]are excluded. -
System prompt generation — A system prompt is built listing the filterable fields with their types and descriptions, instructing the LLM to output a JSON object with
semantic_terms,structured_filters,confidence, andexplanation. -
LLM call — The query text is sent as a user message alongside the system prompt via
litellm.completion()(sync) orlitellm.acompletion()(async). -
Response parsing — The LLM's text response is parsed as JSON (handling both raw JSON and Markdown code fences). Values are type-coerced and clamped to produce a valid
ParsedQuery.
Supported Filter Operators
The parser instructs the LLM to use MongoDB-style operators:
| Operator | Meaning | Example |
|---|---|---|
$eq |
Equals | {"vendor": {"$eq": "Acme"}} |
$ne |
Not equals | {"paid": {"$ne": true}} |
$gt |
Greater than | {"amount": {"$gt": 1000}} |
$gte |
Greater than or equal | {"amount": {"$gte": 5000}} |
$lt |
Less than | {"amount": {"$lt": 100}} |
$lte |
Less than or equal | {"due_date": {"$lte": "2024-12-31"}} |
$in |
In list | {"vendor": {"$in": ["Acme", "Globex"]}} |
$nin |
Not in list | {"vendor": {"$nin": ["Initech"]}} |
Development
# Install dev dependencies
uv sync
# Run tests
uv run pytest tests/ -v
# Lint
uv run ruff check langcore_rag/ tests/
# Format
uv run ruff format langcore_rag/ tests/
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langcore_rag-1.0.0.tar.gz.
File metadata
- Download URL: langcore_rag-1.0.0.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de55651e001dfbff1a3b9966aed630fa9e8ec3d194b212dc8431f5de3efc923d
|
|
| MD5 |
c0def164405b941203262f5b9473be4b
|
|
| BLAKE2b-256 |
238dc2d45132493cd97c585cb9572cdfa7f6fa26d1ac0e045dcb11a612b4dd3b
|
Provenance
The following attestation bundles were made for langcore_rag-1.0.0.tar.gz:
Publisher:
release.yml on IgnatG/langcore-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langcore_rag-1.0.0.tar.gz -
Subject digest:
de55651e001dfbff1a3b9966aed630fa9e8ec3d194b212dc8431f5de3efc923d - Sigstore transparency entry: 980511651
- Sigstore integration time:
-
Permalink:
IgnatG/langcore-rag@ca05a1699af9aeda0f4bba79b7b190ee9b66836c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IgnatG
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ca05a1699af9aeda0f4bba79b7b190ee9b66836c -
Trigger Event:
push
-
Statement type:
File details
Details for the file langcore_rag-1.0.0-py3-none-any.whl.
File metadata
- Download URL: langcore_rag-1.0.0-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5260c4d9d017d6963395b11775d6f1377b56b0846784134b6082715191f258bc
|
|
| MD5 |
4f82a3c581042cac6cd58defbeb2c5b7
|
|
| BLAKE2b-256 |
66a8a6f72dfaf3b92327885c592fba2efd25e54f57f6f649ff5850d7d8ca5f67
|
Provenance
The following attestation bundles were made for langcore_rag-1.0.0-py3-none-any.whl:
Publisher:
release.yml on IgnatG/langcore-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langcore_rag-1.0.0-py3-none-any.whl -
Subject digest:
5260c4d9d017d6963395b11775d6f1377b56b0846784134b6082715191f258bc - Sigstore transparency entry: 980511725
- Sigstore integration time:
-
Permalink:
IgnatG/langcore-rag@ca05a1699af9aeda0f4bba79b7b190ee9b66836c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/IgnatG
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ca05a1699af9aeda0f4bba79b7b190ee9b66836c -
Trigger Event:
push
-
Statement type: