RAG query parsing plugin — parse natural language queries into semantic terms and structured filters using LLMs

Project description

LangCore RAG

Plugin for LangCore — parse natural-language queries into semantic search terms and structured metadata filters for hybrid RAG pipelines.

Overview

langcore-rag is a plugin for LangCore that decomposes natural-language queries into semantic terms (for vector/similarity search) and structured metadata filters (for database or index filtering). It introspects your Pydantic schema to auto-discover filterable fields, calls an LLM to parse the query, and returns MongoDB-style filter operators ready for your retrieval backend.

Features

Query decomposition — splits free-form queries into semantic search terms and structured filter conditions
Pydantic schema introspection — automatically discovers filterable fields (int, float, str, bool, date, datetime) from your schema
MongoDB-style operators — $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin for precise filter generation
Confidence scoring — 0.0–1.0 confidence score indicating parse quality
Human-readable explanation — rationale for how the query was decomposed
Sync and async — both parse() and async_parse() methods
Robust JSON parsing — handles raw JSON, Markdown fences, and graceful fallback
Any LLM backend — uses LiteLLM for access to 100+ model providers
Zero manual prompt engineering — system prompt is auto-generated from your schema

Installation

pip install langcore-rag

Quick Start

1. Define a Schema

Define a Pydantic model whose fields represent the filterable metadata in your document store:

from pydantic import BaseModel, Field

class Invoice(BaseModel):
    amount: float = Field(description="Total invoice amount in USD")
    due_date: str = Field(description="Due date in ISO-8601 format")
    vendor: str = Field(description="Vendor / supplier name")
    paid: bool = Field(description="Whether the invoice is paid")

2. Parse a Query

from langcore_rag import QueryParser

parser = QueryParser(schema=Invoice, model_id="gemini/gemini-2.5-flash")
parsed = parser.parse("invoices over $5000 due in March 2024")

print(parsed.semantic_terms)
# → ["invoices"]

print(parsed.structured_filters)
# → {"amount": {"$gte": 5000}, "due_date": {"$gte": "2024-03-01", "$lte": "2024-03-31"}}

print(parsed.confidence)
# → 0.92

print(parsed.explanation)
# → "Extracted amount ≥ 5000 and date range for March 2024."

3. Use in a RAG Pipeline

Feed the parsed output into your vector store and metadata filter layer:

from langcore_rag import QueryParser

parser = QueryParser(schema=Invoice, model_id="gpt-4o")
parsed = parser.parse("unpaid invoices from Acme Corp over $10,000")

# Semantic search with your vector store
vector_results = vector_store.similarity_search(
    query=" ".join(parsed.semantic_terms),
    k=20,
)

# Apply structured filters to narrow results
filtered = [
    doc for doc in vector_results
    if apply_filters(doc.metadata, parsed.structured_filters)
]

4. Async Usage

import asyncio
from langcore_rag import QueryParser

async def main():
    parser = QueryParser(schema=Invoice, model_id="gpt-4o")
    parsed = await parser.async_parse("unpaid invoices from Acme Corp")
    print(parsed.structured_filters)
    # → {"paid": {"$eq": false}, "vendor": {"$eq": "Acme Corp"}}

asyncio.run(main())

5. Query Caching

Enable an LRU cache to skip LLM calls for repeated queries:

from langcore_rag import QueryParser

parser = QueryParser(schema=Invoice, model_id="gpt-4o", cache_maxsize=128)

parsed1 = parser.parse("invoices over $5000")   # LLM call
parsed2 = parser.parse("invoices over $5000")   # Cache hit — no LLM call

print(parser.cache_info)   # CacheInfo(hits=1, misses=1, maxsize=128, currsize=1)
parser.clear_cache()        # Manually clear when needed

6. Sync Bridge for Jupyter / Running Event Loops

Use parse_sync_from_async when you need synchronous parsing inside an environment that already has a running event loop (e.g. Jupyter notebooks):

from langcore_rag import QueryParser

parser = QueryParser(schema=Invoice, model_id="gpt-4o")

# Works inside Jupyter cells where asyncio.run() would fail
parsed = parser.parse_sync_from_async("invoices from March 2024")
print(parsed.semantic_terms)

Integration with LangCore

langcore-rag uses LangCore's LLM ecosystem (via LiteLLM) for query parsing. It works with any model supported by LiteLLM:

from langcore_rag import QueryParser

# Use any LiteLLM-compatible model
parser = QueryParser(
    schema=Invoice,
    model_id="gpt-4o",          # or "gemini/gemini-2.5-flash", "anthropic/claude-3-opus", etc.
    temperature=0.0,             # Deterministic output
    max_tokens=1024,
    api_key="sk-...",            # Optional — override env var
)

When deployed via langcore-api, the RAG parser is available as a REST endpoint (POST /api/v1/rag/parse) with full configuration via environment variables.

API Reference

QueryParser

QueryParser(
    schema: type[BaseModel],
    model_id: str,
    *,
    temperature: float = 0.0,
    max_tokens: int = 1024,
    max_retries: int = 2,
    cache_maxsize: int | None = None,
    **litellm_kwargs,
)

Parameter	Type	Description
`schema`	`type[BaseModel]`	Pydantic model whose fields define filterable metadata
`model_id`	`str`	Any LiteLLM-compatible model ID
`temperature`	`float`	Sampling temperature (default `0.0` for deterministic output)
`max_tokens`	`int`	Maximum tokens to generate (default `1024`)
`max_retries`	`int`	Number of retry attempts on malformed LLM responses (default `2`, meaning 3 total attempts)
`cache_maxsize`	`int \| None`	When set to a positive integer, enables an LRU cache on `parse()` so identical queries skip the LLM call (default `None` — no caching)
`**litellm_kwargs`		Extra kwargs forwarded to `litellm.completion()` (e.g., `api_key`, `api_base`, `timeout`)

Methods

Method	Signature	Description
`parse`	`(query_text: str) -> ParsedQuery`	Synchronous query parsing (uses cache when enabled)
`async_parse`	`(query_text: str) -> ParsedQuery`	Asynchronous query parsing
`parse_sync_from_async`	`(query_text: str) -> ParsedQuery`	Run `async_parse` from sync code; works inside running event loops (Jupyter, Quart)
`clear_cache`	`() -> None`	Clear the LRU cache (no-op when caching is disabled)

Properties

Property	Type	Description
`schema`	`type[BaseModel]`	The Pydantic schema used for field discovery
`model_id`	`str`	The LiteLLM model identifier
`system_prompt`	`str`	The auto-generated system prompt (useful for debugging)
`cache_info`	`CacheInfo \| None`	LRU cache statistics, or `None` when caching is disabled

ParsedQuery

An immutable (frozen) dataclass returned by parse() / async_parse():

Field	Type	Description
`semantic_terms`	`list[str]`	Free-text terms for vector / similarity search
`structured_filters`	`dict[str, Any]`	Metadata filters with MongoDB-style operators
`confidence`	`float`	0.0–1.0 confidence in the parse quality
`explanation`	`str`	Human-readable rationale for the decomposition

Supported Filter Operators

Operator	Meaning	Example
`$eq`	Equals	`{"vendor": {"$eq": "Acme"}}`
`$ne`	Not equals	`{"paid": {"$ne": true}}`
`$gt`	Greater than	`{"amount": {"$gt": 1000}}`
`$gte`	Greater than or equal	`{"amount": {"$gte": 5000}}`
`$lt`	Less than	`{"amount": {"$lt": 100}}`
`$lte`	Less than or equal	`{"due_date": {"$lte": "2024-12-31"}}`
`$in`	In list	`{"vendor": {"$in": ["Acme", "Globex"]}}`
`$nin`	Not in list	`{"vendor": {"$nin": ["Initech"]}}`

How It Works

Schema introspection — inspects the Pydantic model's fields to identify filterable types (int, float, str, bool, date, datetime). Complex types like list[str] are excluded.
System prompt generation — builds a prompt listing filterable fields with types and descriptions, instructing the LLM to output structured JSON.
LLM call — sends the query as a user message with the system prompt via litellm.completion() or litellm.acompletion().
Response parsing — parses the response as JSON (handling fences and edge cases), type-coerces values, and clamps confidence to produce a valid ParsedQuery.
Retry on failure — if the LLM returns malformed JSON, the parser retries up to max_retries times (default 2, so 3 total attempts). Each retry is logged.
Graceful fallback — if all retries are exhausted, returns a ParsedQuery(semantic_terms=[query_text], structured_filters={}, confidence=0.0) so callers always receive a usable result.

Composing with Other Plugins

langcore-rag complements the extraction plugins. Use it to find relevant documents, then extract structured data:

import langcore as lx
from langcore_rag import QueryParser

# Step 1: Parse the user's query
parser = QueryParser(schema=Invoice, model_id="gpt-4o")
parsed = parser.parse("invoices from Acme over $5000")

# Step 2: Retrieve relevant documents from your store
docs = document_store.search(
    query=parsed.semantic_terms,
    filters=parsed.structured_filters,
)

# Step 3: Extract structured entities from retrieved documents
for doc in docs:
    result = lx.extract(
        text_or_documents=doc.text,
        model_id="gemini-2.5-flash",
        prompt_description="Extract invoice details.",
        examples=[...],
    )
    print(result)

Development

uv sync                                    # Install dependencies
uv run pytest tests/ -v                    # Run tests
uv run ruff check langcore_rag/ tests/     # Lint
uv run ruff format langcore_rag/ tests/    # Format

Requirements

Python ≥ 3.12
langcore
litellm ≥ 1.81.13
pydantic ≥ 2.12.0

License

Apache License 2.0 — see LICENSE for details.

Project details

Release history Release notifications | RSS feed

This version

1.2.0

Feb 24, 2026

1.1.0

Feb 24, 2026

1.0.4

Feb 23, 2026

1.0.3

Feb 23, 2026

1.0.2

Feb 23, 2026

1.0.1

Feb 23, 2026

1.0.0

Feb 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langcore_rag-1.2.0.tar.gz (23.3 kB view details)

Uploaded Feb 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langcore_rag-1.2.0-py3-none-any.whl (15.9 kB view details)

Uploaded Feb 24, 2026 Python 3

File details

Details for the file langcore_rag-1.2.0.tar.gz.

File metadata

Download URL: langcore_rag-1.2.0.tar.gz
Upload date: Feb 24, 2026
Size: 23.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langcore_rag-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`19f548cdf5e86b30db9581eecc26f57c3c72ce25e5a1a9b880a4b41bc317e4c6`
MD5	`c34b1f6787509f9ea162e7c2fa49c951`
BLAKE2b-256	`060e620fa5088efc59da999a97a8c6de5d22812e2319e0c61c34c0febe978f31`

See more details on using hashes here.

Provenance

The following attestation bundles were made for langcore_rag-1.2.0.tar.gz:

Publisher: release.yml on IgnatG/langcore-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: langcore_rag-1.2.0.tar.gz
- Subject digest: 19f548cdf5e86b30db9581eecc26f57c3c72ce25e5a1a9b880a4b41bc317e4c6
- Sigstore transparency entry: 985293428
- Sigstore integration time: Feb 24, 2026
Source repository:
- Permalink: IgnatG/langcore-rag@7fbc7cc92b2e3504190d4fb87606cc745d661ea8
- Branch / Tag: refs/heads/main
- Owner: https://github.com/IgnatG
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@7fbc7cc92b2e3504190d4fb87606cc745d661ea8
- Trigger Event: push

File details

Details for the file langcore_rag-1.2.0-py3-none-any.whl.

File metadata

Download URL: langcore_rag-1.2.0-py3-none-any.whl
Upload date: Feb 24, 2026
Size: 15.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langcore_rag-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2835626f5951db4e4cccb931e1eda02ef865d4057618163cde4d82906342da53`
MD5	`7866d7bf22604ddddc94b64648fbfe20`
BLAKE2b-256	`08687678acef53c5aea8a20d1cd600d63fdbeda3a0578f441e6465a2768f16c5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for langcore_rag-1.2.0-py3-none-any.whl:

Publisher: release.yml on IgnatG/langcore-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: langcore_rag-1.2.0-py3-none-any.whl
- Subject digest: 2835626f5951db4e4cccb931e1eda02ef865d4057618163cde4d82906342da53
- Sigstore transparency entry: 985293465
- Sigstore integration time: Feb 24, 2026
Source repository:
- Permalink: IgnatG/langcore-rag@7fbc7cc92b2e3504190d4fb87606cc745d661ea8
- Branch / Tag: refs/heads/main
- Owner: https://github.com/IgnatG
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@7fbc7cc92b2e3504190d4fb87606cc745d661ea8
- Trigger Event: push

langcore-rag 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LangCore RAG

Overview

Features

Installation

Quick Start

1. Define a Schema

2. Parse a Query

3. Use in a RAG Pipeline

4. Async Usage

5. Query Caching

6. Sync Bridge for Jupyter / Running Event Loops

Integration with LangCore

API Reference

QueryParser

Methods

Properties

ParsedQuery

Supported Filter Operators

How It Works

Composing with Other Plugins

Development

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance