Skip to main content

rag007 โ€” multi-backend retrieval-augmented generation with LangGraph

Project description

rag007 ๐Ÿ•ต๏ธ๐Ÿธ๐Ÿš—๐ŸŽฏ โ€” Licensed to Retrieve

Not just hybrid search. A true autonomous retrieval agent.
Shaken, not stirred โ€” plug in any vector store, any LLM, any reranker.
The mission: find the right documents, neutralise irrelevant noise, and deliver the answer. Every time.

PyPI Python License: MIT CI


from rag007 import init_agent

rag = init_agent("documents", model="openai:gpt-5.4", backend="qdrant")
state = rag.chat("What is the status of operation overlord?")
# Your answer. Shaken, not stirred.

๐Ÿ•ต๏ธ The Agent

"We have a problem. Millions of documents. One question. And the clock is ticking."

Most retrieval systems send a junior analyst โ€” one query, one pass, done. Fast, cheap, and dangerously incomplete.

So they sent rag007. Licensed to retrieve. Never satisfied with good enough.

Before every mission, rag007 visits Q's lab: 8 backends to operate from, any LLM as the intelligence source, precision rerankers to separate signal from noise, and a tool-calling agent that inspects schemas, builds filters on the fly, and adapts to whatever the index throws at it.

In the field, it plans, infiltrates, and interrogates โ€” running parallel searches across BM25 and vector space, fusing the evidence, and cross-examining every result through an LLM quality gate. When the trail goes cold, it rewrites the query and tries again. It doesn't stop until the mission is complete.

Only once the evidence is airtight does it surface the answer. Cited. Grounded. Delivered.

๐Ÿธ "Shaken, not stirred โ€” and always on target." ๐ŸŽฏ

Not in the name of any crown or government. In the name of whoever is seeking the truth in their data.


๐Ÿ•ต๏ธ How It Works

Most RAG libraries are pipelines โ€” query in, documents out, done. rag007 is an agent.

Like a field operative, it doesn't execute a single search and report back. It thinks, adapts, and keeps going until the mission is complete:

  1. ๐Ÿง  Understands the intent โ€” rewrites your query into precise search keywords, detects whether it's a keyword lookup or semantic question, and adjusts the hybrid search ratio accordingly
  2. ๐Ÿ” Searches intelligently โ€” runs multiple query variants simultaneously across BM25 and vector search, fuses the results, and re-ranks with a dedicated reranker
  3. ๐Ÿง Judges the results โ€” an LLM quality gate evaluates whether the retrieved documents actually answer the question
  4. ๐Ÿ”„ Adapts autonomously โ€” if results are off-target, rewrites the query and tries again; if a single approach fails, fans out into a swarm of parallel search strategies
  5. โœ๏ธ Delivers the answer โ€” only once it's confident the evidence is solid does it generate a cited, grounded response

This is the difference between a search box and a field agent.


โœจ Features

  • ๐Ÿš— Fast as an Aston Martin โ€” fully async pipeline, parallel HyDE + preprocessing, zero blocking calls
  • ๐ŸŽฏ On target, every time โ€” LLM quality gate rejects weak results and rewrites the query until the evidence is airtight
  • ๐Ÿ”ฌ Deep research, not shallow search โ€” multi-query swarm fans out across BM25 and vector space simultaneously, fusing intelligence from every angle
  • ๐Ÿƒ Always has an ace up its sleeve โ€” when one approach fails, swarm retrieval deploys parallel strategies as backup
  • ๐Ÿ•ต๏ธ True agentic loop โ€” retrieve โ†’ judge โ†’ rewrite โ†’ retry, fully autonomous, up to max_iter rounds
  • ๐Ÿ” Hybrid search โ€” BM25 + vector, fused with RRF or DBSF
  • ๐Ÿง  HyDE โ€” hypothetical document embeddings for better recall on vague queries
  • ๐Ÿ› ๏ธ Tool-calling agent โ€” get_index_settings, get_filter_values, search_hybrid, search_bm25, rerank_results โ€” LLM picks tools dynamically
  • ๐Ÿ† Multi-reranker โ€” Cohere, HuggingFace, Jina, ColBERT, RankGPT, or custom
  • ๐Ÿ—„๏ธ 8 backends โ€” Meilisearch, Azure AI Search, ChromaDB, LanceDB, Qdrant, pgvector, DuckDB, InMemory
  • ๐Ÿค– Any LLM โ€” OpenAI, Azure, Anthropic, Ollama, Vertex AI, or any LangChain model
  • โšก One-line init โ€” init_agent("docs", model="openai:gpt-5.4", backend="qdrant") โ€” no imports needed
  • ๐Ÿ’ฌ Multi-turn chat โ€” conversation history with citation-aware answers
  • ๐ŸŽฏ Auto-strategy โ€” LLM samples your collection and tunes itself automatically
  • ๐Ÿ”„ Async-native โ€” every operation has a sync and async variant

๐Ÿ“ฆ Install

# Recommended โ€” Meilisearch + Cohere reranker + interactive CLI
pip install rag007[recommended]

# Base only โ€” in-memory backend, BM25 keyword search
pip install rag007
Extra What you get Command
recommended Meilisearch + Cohere reranker + Rich CLI pip install rag007[recommended]
cli Interactive CLI with guided setup wizard pip install rag007[cli]
all Every backend + reranker + CLI pip install rag007[all]
๐Ÿธ Bond Edition extras โ€” because every mission needs a code name
Extra Code name Stack
goldeneye GoldenEye Meilisearch + Cohere + CLI โ€” the classic recommended loadout
skyfall Skyfall Everything. All backends, all rerankers, all CLI โ€” nothing left behind
thunderball Thunderball Qdrant + Cohere + CLI โ€” vector power meets precision reranking
moonraker Moonraker ChromaDB + HuggingFace โ€” fully local, no API keys, off the grid
goldfinger Goldfinger Azure AI Search + Azure OpenAI + Cohere โ€” all gold, all cloud
spectre Spectre pgvector + HuggingFace โ€” open-source shadow ops, no paid APIs
casino-royale Casino Royale ChromaDB + Jina โ€” lightweight first mission
pip install rag007[goldeneye]      # ๐Ÿธ The classic
pip install rag007[skyfall]        # ๐Ÿ’ฅ Everything falls into place
pip install rag007[thunderball]    # โšก Vector power + precision
pip install rag007[moonraker]     # ๐ŸŒ™ Fully local, no API keys
pip install rag007[goldfinger]     # โ˜๏ธ  All Azure, all gold
pip install rag007[spectre]        # ๐Ÿ‘ป Open-source, no paid APIs
pip install rag007[casino-royale]  # ๐ŸŽฐ Lightweight first mission
Individual backends & rerankers
pip install rag007[meilisearch]     # ๐Ÿ”Ž Meilisearch
pip install rag007[azure]           # โ˜๏ธ  Azure AI Search
pip install rag007[chromadb]        # ๐ŸŸฃ ChromaDB
pip install rag007[lancedb]         # ๐Ÿน LanceDB
pip install rag007[pgvector]        # ๐Ÿ˜ PostgreSQL + pgvector
pip install rag007[qdrant]          # ๐ŸŸก Qdrant
pip install rag007[duckdb]          # ๐Ÿฆ† DuckDB
pip install rag007[cohere]          # ๐Ÿ… Cohere reranker
pip install rag007[huggingface]     # ๐Ÿค— HuggingFace cross-encoder (local)
pip install rag007[jina]            # ๐ŸŒŠ Jina reranker
pip install rag007[rerankers]       # ๐ŸŽฏ rerankers (ColBERT, Flashrank, RankGPT, โ€ฆ)

Mix and match: pip install rag007[qdrant,cohere,cli]


๐Ÿš€ Quick Start

One-liner with init_agent

The fastest way to get started โ€” no provider imports, string aliases for everything:

from rag007 import init_agent

# Minimal โ€” in-memory backend, LLM from env vars
rag = init_agent("docs")

# OpenAI + Qdrant + Cohere reranker
rag = init_agent(
    "my-collection",
    model="openai:gpt-5.4",
    backend="qdrant",
    backend_url="http://localhost:6333",
    reranker="cohere",
)

# Anthropic + Azure AI Search (native vectorisation, no client-side embeddings)
rag = init_agent(
    "my-index",
    model="anthropic:claude-sonnet-4-6",
    gen_model="anthropic:claude-opus-4-6",
    backend="azure",
    backend_url="https://my-search.search.windows.net",
    reranker="huggingface",
    auto_strategy=True,
)

# Fully local โ€” Ollama + ChromaDB + HuggingFace cross-encoder
rag = init_agent(
    "docs",
    model="ollama:llama3",
    backend="chroma",
    reranker="huggingface",
    reranker_model="cross-encoder/ms-marco-MiniLM-L-6-v2",
)

Multi-collection routing

Pass several collections and let the agent decide which to search. The LLM picks the relevant subset before retrieval, using either the collection names alone or optional natural-language descriptions.

from rag007 import init_agent

# List form โ€” LLM routes by name only
rag = init_agent(
    collections=["products", "faq", "policies"],
    backend="qdrant",
    backend_url="http://localhost:6333",
    model="openai:gpt-5.4",
)

# Dict form โ€” LLM routes using descriptions (better precision)
rag = init_agent(
    collections={
        "products": "Product catalog: SKUs, prices, specs, availability",
        "faq":      "Customer-facing FAQ, troubleshooting, return policy",
        "policies": "Internal HR/legal/compliance policy documents",
    },
    backend="qdrant",
    backend_url="http://localhost:6333",
    model="openai:gpt-5.4",
)

rag.invoke("What's our return policy?")       # โ†’ routes to faq / policies
rag.invoke("Price of SKU 12345?")              # โ†’ routes to products

Each retrieved document carries its origin in metadata["_collection"] so you can merge, filter, or attribute citations downstream. One backend instance is built per collection; they share the same backend type and URL.

Backend aliases

Alias Class Extra
"memory" / "in_memory" InMemoryBackend (none)
"chroma" / "chromadb" ChromaDBBackend rag007[chromadb]
"qdrant" QdrantBackend rag007[qdrant]
"lancedb" / "lance" LanceDBBackend rag007[lancedb]
"duckdb" DuckDBBackend rag007[duckdb]
"pgvector" / "pg" PgvectorBackend rag007[pgvector]
"meilisearch" MeilisearchBackend rag007[meilisearch]
"azure" AzureAISearchBackend rag007[azure]

Reranker aliases

Alias Class reranker_model Extra
"cohere" CohereReranker Cohere model name (default: rerank-v3.5) rag007[cohere]
"huggingface" / "hf" HuggingFaceReranker HF model name (default: cross-encoder/ms-marco-MiniLM-L-6-v2) rag007[huggingface]
"jina" JinaReranker Jina model name (default: jina-reranker-v2-base-multilingual) rag007[jina]
"llm" LLMReranker (uses the agent's LLM) (none)
"rerankers" RerankersReranker Any model from the rerankers library rag007[rerankers]
# Cohere (default model)
rag = init_agent("docs", model="openai:gpt-5.4", reranker="cohere")

# HuggingFace โ€” multilingual model
rag = init_agent("docs", model="openai:gpt-5.4", reranker="huggingface",
                 reranker_model="cross-encoder/mmarco-mMiniLMv2-L12-H384-v1")

# Jina
rag = init_agent("docs", model="openai:gpt-5.4", reranker="jina")  # uses JINA_API_KEY

# ColBERT via rerankers library
rag = init_agent("docs", model="openai:gpt-5.4", reranker="rerankers",
                 reranker_model="colbert-ir/colbertv2.0",
                 reranker_kwargs={"model_type": "colbert"})

# Pass a pre-built reranker instance directly
from rag007 import CohereReranker
rag = init_agent("docs", reranker=CohereReranker(model="rerank-v3.5", api_key="..."))

Model strings: any "provider:model-name" from LangChain's init_chat_model โ€” openai, anthropic, azure_openai, google_vertexai, ollama, groq, mistralai, and more

Manual setup

from rag007 import Agent, InMemoryBackend

backend = InMemoryBackend(embed_fn=my_embed_fn)
backend.add_documents([
    {"content": "RAG combines retrieval with generation", "source": "wiki"},
    {"content": "Vector search finds similar embeddings", "source": "docs"},
])

rag = Agent(index="demo", backend=backend)

# Single query โ†’ full answer
state = rag.invoke("What is retrieval-augmented generation?")
print(state.answer)

# Retrieve only โ€” documents without LLM answer
query, docs = rag.retrieve_documents("What is retrieval-augmented generation?")
for doc in docs:
    print(doc.page_content)

# Override top-K at call time
query, docs = rag.retrieve_documents("hybrid search", top_k=3)

Agent.from_model โ€” model string with explicit backend

from rag007 import Agent, QdrantBackend

rag = Agent.from_model(
    "openai:gpt-5.4-mini",          # fast model for routing & rewriting
    index="docs",
    gen_model="openai:gpt-5.4",     # powerful model for the final answer
    backend=QdrantBackend("docs", url="http://localhost:6333"),
)

๐Ÿ’ฌ Multi-turn Chat

from rag007 import Agent, ConversationTurn

rag = Agent(index="articles")
history: list[ConversationTurn] = []

state = rag.chat("What is hybrid search?", history)
history.append(ConversationTurn(question="What is hybrid search?", answer=state.answer))

state = rag.chat("How does it compare to pure vector search?", history)
print(state.answer)
print(f"Sources: {len(state.documents)}")

Async variant:

state = await rag.achat("What is hybrid search?", history)

๐Ÿ—๏ธ Architecture

rag007 has two operating modes โ€” both fully autonomous:

Graph mode (rag.chat / rag.invoke)

The default. A LangGraph state machine that runs the full agentic pipeline:

Query
  โ”‚
  โ”œโ”€[HyDE]โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  Hypothetical document embedding (parallel)      โ”‚
  โ”‚                                                  โ–ผ
  โ–ผ                                         [Embed HyDE text]
[Preprocess]                                         โ”‚
  Extract keywords + variants                        โ”‚
  Detect semantic_ratio + fusion strategy            โ”‚
  โ”‚                                                  โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚
                        โ–ผ
              [Hybrid Search ร— N queries]
               BM25 + Vector, multi-arm
                        โ”‚
                        โ–ผ
               [RRF / DBSF Fusion]
                        โ”‚
                        โ–ผ
                    [Rerank]
               Cohere / HF / Jina / LLM
                        โ”‚
                        โ–ผ
               [Quality Gate]
               LLM judges relevance
                   โ”‚         โ”‚
                (good)     (bad)
                   โ”‚         โ”‚
                   โ–ผ         โ–ผ
              [Generate]  [Rewrite] โ”€โ”€โ–บ loop (max_iter)
                   โ”‚
                   โ–ผ
        Answer + [n] inline citations

Tool-calling agent mode (rag.invoke_agent)

The agent receives a set of tools and reasons step-by-step, calling them in whatever order makes sense for the question. No fixed pipeline โ€” pure field improvisation:

Query
  โ”‚
  โ–ผ
[LLM Agent]  โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  Thinks: "What do I need to answer this?"           โ”‚
  โ”‚                                                  โ”‚
  โ”œโ”€โ”€ get_index_settings()                           โ”‚
  โ”‚   Discover filterable / sortable / boost fields  โ”‚
  โ”‚                                                  โ”‚
  โ”œโ”€โ”€ get_filter_values(field)                       โ”‚
  โ”‚   Sample real stored values for a field          โ”‚
  โ”‚   โ†’ build precise filter expressions             โ”‚
  โ”‚                                                  โ”‚
  โ”œโ”€โ”€ search_hybrid(query, filter, sort_fields)      โ”‚
  โ”‚   BM25 + vector, optional filter + sort boost    โ”‚
  โ”‚                                                  โ”‚
  โ”œโ”€โ”€ search_bm25(query, filter)                     โ”‚
  โ”‚   Fallback pure keyword search                   โ”‚
  โ”‚                                                  โ”‚
  โ”œโ”€โ”€ rerank_results(query, hits)                    โ”‚
  โ”‚   Re-rank with configured reranker               โ”‚
  โ”‚                                                  โ”‚
  โ””โ”€โ”€ [needs more info?] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚

  [done]
  โ”‚
  โ–ผ
Answer  (tool calls explained inline)

Use invoke_agent when questions involve dynamic filtering โ€” the agent inspects the index schema, samples real field values, builds filters on the fly, and decides whether to sort by business signals like popularity or recency.


๐Ÿ—„๏ธ Backends

โ˜๏ธ Azure AI Search

Native hybrid search โ€” no client-side embeddings needed when the index has an integrated vectorizer:

from rag007 import Agent, AzureAISearchBackend

# Native vectorization โ€” service embeds the query server-side
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
    ),
)

# Client-side vectorization
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
        embed_fn=my_embed_fn,
    ),
)

# With Azure semantic reranking
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
        semantic_config="my-semantic-config",
    ),
)

๐ŸŸก Qdrant

from rag007 import Agent, QdrantBackend

rag = Agent(
    index="my_collection",
    backend=QdrantBackend("my_collection", url="http://localhost:6333", embed_fn=my_embed_fn),
)

๐ŸŸฃ ChromaDB

from rag007 import Agent, ChromaDBBackend

rag = Agent(
    index="my_collection",
    backend=ChromaDBBackend("my_collection", path="./chroma_db", embed_fn=my_embed_fn),
)

๐Ÿน LanceDB

from rag007 import Agent, LanceDBBackend

rag = Agent(
    index="docs",
    backend=LanceDBBackend("docs", db_uri="./lancedb", embed_fn=my_embed_fn),
)

๐Ÿ˜ PostgreSQL + pgvector

from rag007 import Agent, PgvectorBackend

rag = Agent(
    index="documents",
    backend=PgvectorBackend(
        "documents",
        dsn="postgresql://user:pass@localhost:5432/mydb",
        embed_fn=my_embed_fn,
    ),
)

๐Ÿฆ† DuckDB

from rag007 import Agent, DuckDBBackend

rag = Agent(
    index="vectors",
    backend=DuckDBBackend("vectors", db_path="./my.duckdb", embed_fn=my_embed_fn),
)

๐Ÿ”Ž Meilisearch

from rag007 import Agent, MeilisearchBackend

rag = Agent(
    index="articles",
    backend=MeilisearchBackend("articles", url="http://localhost:7700", api_key="masterKey"),
)

๐Ÿ“ฆ InMemory (default, zero dependencies)

from rag007 import Agent, InMemoryBackend

backend = InMemoryBackend(embed_fn=my_embed_fn)
backend.add_documents([
    {"content": "RAG combines retrieval with generation", "source": "wiki"},
    {"content": "Vector search finds similar embeddings", "source": "docs"},
])

rag = Agent(index="demo", backend=backend)

๐Ÿค– LLM Configuration

Pass a pre-built LangChain model or use init_agent / Agent.from_model for string-based init.
When using Agent directly, configure via env vars or pass an explicit model instance.

OpenAI

from langchain_openai import ChatOpenAI
from rag007 import Agent

rag = Agent(
    index="articles",
    llm=ChatOpenAI(model="gpt-5.4", api_key="sk-..."),
    gen_llm=ChatOpenAI(model="gpt-5.4", api_key="sk-..."),
)

Azure OpenAI (explicit keys)

from langchain_openai import AzureChatOpenAI
from rag007 import Agent

llm = AzureChatOpenAI(
    azure_endpoint="https://my-resource.openai.azure.com",
    azure_deployment="gpt-5.4",
    api_key="...",
    api_version="2024-12-01-preview",
)
rag = Agent(index="articles", llm=llm, gen_llm=llm)

Azure OpenAI (env vars)

# Set: AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT
from rag007 import Agent

rag = Agent(index="articles")  # auto-detected

Azure OpenAI with Managed Identity (no API key)

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from langchain_openai import AzureChatOpenAI
from rag007 import Agent

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)
llm = AzureChatOpenAI(
    azure_endpoint="https://my-resource.openai.azure.com",
    azure_deployment="gpt-5.4",
    azure_ad_token_provider=token_provider,
    api_version="2024-12-01-preview",
)
rag = Agent(index="articles", llm=llm, gen_llm=llm)

Anthropic Claude

pip install langchain-anthropic
from langchain_anthropic import ChatAnthropic
from rag007 import Agent

llm = ChatAnthropic(model="claude-sonnet-4-6", api_key="sk-ant-...")
rag = Agent(index="articles", llm=llm, gen_llm=llm)

Ollama (local, no API key)

pip install langchain-ollama
from langchain_ollama import ChatOllama
from rag007 import Agent

rag = Agent(
    index="articles",
    llm=ChatOllama(model="llama3.2", base_url="http://localhost:11434"),
    gen_llm=ChatOllama(model="llama3.2", base_url="http://localhost:11434"),
)

Google Vertex AI

pip install langchain-google-vertexai
from langchain_google_vertexai import ChatVertexAI
from rag007 import Agent

llm = ChatVertexAI(model="gemini-2.0-flash", project="my-gcp-project", location="us-central1")
rag = Agent(index="articles", llm=llm, gen_llm=llm)

Separate fast and generation models

Use a cheap/fast model for query rewriting and routing, a powerful model for the final answer:

from langchain_openai import AzureChatOpenAI
from rag007 import Agent

fast_llm = AzureChatOpenAI(azure_deployment="gpt-5.4-mini", api_key="...", api_version="2024-12-01-preview")
gen_llm  = AzureChatOpenAI(azure_deployment="gpt-5.4",      api_key="...", api_version="2024-12-01-preview")

rag = Agent(index="articles", llm=fast_llm, gen_llm=gen_llm)

๐Ÿ† Rerankers

๐Ÿ… Cohere

from rag007 import Agent, CohereReranker

rag = Agent(index="articles", reranker=CohereReranker(model="rerank-v3.5", api_key="..."))

๐Ÿค— HuggingFace cross-encoder (local, no API key)

pip install rag007[huggingface]
from rag007 import Agent, HuggingFaceReranker

rag = Agent(index="articles", reranker=HuggingFaceReranker())

# Multilingual
rag = Agent(index="articles", reranker=HuggingFaceReranker(model="cross-encoder/mmarco-mMiniLMv2-L12-H384-v1"))

๐ŸŒŠ Jina (multilingual API)

pip install rag007[jina]
from rag007 import Agent, JinaReranker

rag = Agent(index="articles", reranker=JinaReranker(api_key="..."))  # or JINA_API_KEY env var

๐ŸŽฏ rerankers โ€” ColBERT / Flashrank / RankGPT / any cross-encoder

Unified bridge to the rerankers library by answer.ai:

pip install rag007[rerankers]
from rag007 import Agent, RerankersReranker

rag = Agent(index="articles", reranker=RerankersReranker("cross-encoder/ms-marco-MiniLM-L-6-v2", model_type="cross-encoder"))
rag = Agent(index="articles", reranker=RerankersReranker("colbert-ir/colbertv2.0", model_type="colbert"))
rag = Agent(index="articles", reranker=RerankersReranker("flashrank", model_type="flashrank"))
rag = Agent(index="articles", reranker=RerankersReranker("gpt-5.4-mini", model_type="rankgpt", api_key="..."))

๐Ÿ”ง Custom reranker

from rag007 import Agent, RerankResult

class MyReranker:
    def rerank(self, query: str, documents: list[str], top_n: int) -> list[RerankResult]:
        return [RerankResult(index=i, relevance_score=1.0 / (i + 1)) for i in range(top_n)]

rag = Agent(index="articles", reranker=MyReranker())

๐Ÿ› ๏ธ Tools

When using invoke_agent, the LLM has access to a set of tools it can call in any order. No fixed pipeline โ€” the agent decides what it needs.

Tool Description
get_index_settings() Discover filterable, searchable, sortable, and boost fields from the index schema
get_filter_values(field) Sample real stored values for a field โ€” used to build precise filter expressions
search_hybrid(query, filter_expr, semantic_ratio, sort_fields) BM25 + vector hybrid search with optional filter and sort boost
search_bm25(query, filter_expr) Pure keyword search โ€” fallback when hybrid returns poor results
rerank_results(query, hits) Re-rank a list of hits with the configured reranker

The agent follows this reasoning pattern:

  1. Call get_index_settings() to learn the schema
  2. If the question names a specific entity, call get_filter_values(field) to find the exact stored value
  3. Call search_hybrid() with a filter and/or sort if relevant, otherwise broad hybrid search
  4. Fall back to search_bm25() if results are thin
  5. Call rerank_results() to surface the most relevant hits
  6. Summarise โ€” explaining which filters and signals influenced the answer
from rag007 import Agent

rag = Agent(index="products")

# Agent inspects schema, detects brand field, samples values,
# builds filter, sorts by popularity signal โ€” all autonomously
result = rag.invoke_agent("Show me the most popular Bosch power tools")
print(result)

โš™๏ธ Constructor Reference

Agent(
    index="my_index",           # collection / index name
    backend=...,                # SearchBackend (default: InMemoryBackend)
    llm=...,                    # fast LLM โ€” routing, rewrite, filter
    gen_llm=...,                # generation LLM โ€” final answer
    reranker=...,               # Cohere / HuggingFace / Jina / custom
    top_k=10,                   # final result count            [RAG_TOP_K]
    rerank_top_n=5,             # reranker top-n                [RAG_RERANK_TOP_N]
    retrieval_factor=4,         # over-retrieval multiplier     [RAG_RETRIEVAL_FACTOR]
    max_iter=20,                # max retrieve-rewrite cycles   [RAG_MAX_ITER]
    semantic_ratio=0.5,         # hybrid semantic weight        [RAG_SEMANTIC_RATIO]
    fusion="rrf",               # "rrf" or "dbsf"               [RAG_FUSION]
    instructions="",            # extra system prompt for generation
    embed_fn=None,              # (str) -> list[float]
    boost_fn=None,              # (doc_dict) -> float score boost
    base_filter=None,           # always-on filter expression
    hyde_min_words=8,           # min words to trigger HyDE     [RAG_HYDE_MIN_WORDS]
    hyde_style_hint="",         # style hint for HyDE prompt
    auto_strategy=False,        # auto-tune from index samples
)

๐Ÿ“ก API Reference

Method Returns Description
rag.invoke(query) RAGState Full RAG pipeline (sync)
rag.ainvoke(query) RAGState Full RAG pipeline (async)
rag.chat(query, history) RAGState Multi-turn chat (sync)
rag.achat(query, history) RAGState Multi-turn chat (async)
rag.retrieve_documents(query, top_k) (str, list[Document]) Retrieve only, no answer
rag.query(query) str Answer string directly
rag.invoke_agent(query) str Tool-calling agent mode (sync)
rag.ainvoke_agent(query) str Tool-calling agent mode (async)

RAGState fields: answer ยท documents ยท query ยท question ยท history ยท iterations


๐ŸŒ Environment Variables

Variable Description Default
AZURE_OPENAI_ENDPOINT Azure OpenAI endpoint โ€”
AZURE_OPENAI_API_KEY Azure OpenAI API key โ€”
AZURE_OPENAI_DEPLOYMENT Default deployment โ€”
AZURE_OPENAI_FAST_DEPLOYMENT Fast model deployment โ†’ DEPLOYMENT
AZURE_OPENAI_GENERATION_DEPLOYMENT Generation deployment โ†’ DEPLOYMENT
AZURE_OPENAI_API_VERSION API version 2024-12-01-preview
OPENAI_API_KEY OpenAI API key (fallback) โ€”
OPENAI_MODEL OpenAI model name gpt-5.4
AZURE_COHERE_ENDPOINT Azure Cohere endpoint โ€”
AZURE_COHERE_API_KEY Azure Cohere API key โ€”
COHERE_API_KEY Cohere API key (fallback) โ€”
JINA_API_KEY Jina reranker API key โ€”
MEILI_URL Meilisearch URL http://localhost:7700
MEILI_KEY Meilisearch API key masterKey
RAG_TOP_K Final result count 10
RAG_RERANK_TOP_N Reranker top-n 5
RAG_RETRIEVAL_FACTOR Over-retrieval multiplier 4
RAG_SEMANTIC_RATIO Hybrid semantic weight 0.5
RAG_FUSION Fusion strategy rrf
RAG_HYDE_MIN_WORDS Min words to trigger HyDE 8

๐Ÿ–ฅ๏ธ CLI

"The gadgets are ready."

pip install rag007[recommended]

# ๐Ÿง™ Guided setup wizard โ€” choose LLM, embedder, backend, reranker
rag007

# ๐Ÿ’ฌ Chat mode โ€” full agentic pipeline
rag007 --chat -c my_index

# ๐Ÿ” Retriever mode โ€” documents only, no LLM
rag007 --retriever -c my_index

# โšก Skip wizard, use env vars
rag007 --skip-wizard -c my_index

The wizard guides you through:

  1. LLM provider โ€” OpenAI, Anthropic, Ollama, or env default
  2. Embedding model โ€” OpenAI, Azure OpenAI, Ollama, or none (BM25 only)
  3. Vector store โ€” InMemory, Meilisearch, ChromaDB, Qdrant, pgvector, DuckDB, LanceDB, Azure AI Search
  4. Reranker โ€” Cohere, Jina, HuggingFace, LLM-based, or none
  5. Mode โ€” Chat (with answers) or Retriever (documents only)

๐Ÿ“„ License

MIT โ€” Licence to code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag007-0.3.0.tar.gz (403.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag007-0.3.0-py3-none-any.whl (54.3 kB view details)

Uploaded Python 3

File details

Details for the file rag007-0.3.0.tar.gz.

File metadata

  • Download URL: rag007-0.3.0.tar.gz
  • Upload date:
  • Size: 403.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rag007-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f4b675fb03911180a27b8f2f6feb6a04749eb64cb54b940099c1f25fcd538ff3
MD5 9a9779f40b311c142740d35a986776a8
BLAKE2b-256 6cfeb34d04c8c5cd9190fad1a9b0250eb7ef09ebc86947c8903d939caeb2b915

See more details on using hashes here.

Provenance

The following attestation bundles were made for rag007-0.3.0.tar.gz:

Publisher: workflow.yml on bmsuisse/rag007

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rag007-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: rag007-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 54.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rag007-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38d7a97fca72ec4e9da7c70b9a9de86d63806055bfdb48363002d81e0c72b881
MD5 e7f0edf3ae16c99d559ff625f08112e5
BLAKE2b-256 042552b8d52e1e7307f821b8758601454399be7d0e11b5f47f0eef978baf1d87

See more details on using hashes here.

Provenance

The following attestation bundles were made for rag007-0.3.0-py3-none-any.whl:

Publisher: workflow.yml on bmsuisse/rag007

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page