rag007 — multi-backend retrieval-augmented generation with LangGraph

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dominikpeter

These details have not been verified by PyPI

Project description

🔫 rag007 — Licensed to Retrieve

"The name's RAG. rag007."

Not just hybrid search. A true autonomous retrieval agent.
Shaken, not stirred — plug in any vector store, any LLM, any reranker.
The mission: find the right documents, neutralise irrelevant noise, and deliver the answer. Every time.

from rag007 import init_agent

rag = init_agent("documents", model="openai:gpt-5.4", backend="qdrant")
state = rag.chat("What is the status of operation overlord?")
# Your answer. Shaken, not stirred.

🕵️ The Story

Somewhere in a dimly lit briefing room, the director leans forward.

"We have a problem. Millions of documents. One question. And the clock is ticking."

Most retrieval systems would send a junior analyst — one search query, one pass, done. Fast, cheap, and dangerously incomplete. Easily fooled by keyword tricks. Blind to nuance. Useless when the answer hides three layers deep.

So they sent rag007.

Licensed to retrieve. Cool under pressure. Never satisfied with good enough.

Before every mission, rag007 visits Q's lab — a fully equipped arsenal of intelligence tools:

🗄️ Backends — eight field offices to choose from: Azure AI Search, Qdrant, ChromaDB, LanceDB, pgvector, DuckDB, Meilisearch, or a lightweight in-memory post for quick ops. Each office speaks the same language; swapping them requires nothing more than a new introduction.
🤖 LLMs — the agency doesn't play favourites. OpenAI, Anthropic, Azure, Ollama, Vertex AI — any intelligence source will do, as long as it delivers.
🏆 Rerankers — after the initial sweep, a specialist steps in. Cohere, HuggingFace cross-encoders, Jina, ColBERT, RankGPT — precision instruments for separating signal from noise.
🛠️ Tools — in dynamic situations, rag007 goes beyond fixed procedures. It inspects the index schema, samples real field values, constructs precise filter expressions on the fly, and boosts results by business signals. No mission is the same; no playbook is hardcoded.

Fully equipped, rag007 takes the field.

It doesn't just fire a single vector query and file a report. It plans — decomposing the objective, assessing intent, deciding whether this calls for semantic finesse or raw keyword firepower. It infiltrates — running multiple search variants simultaneously across BM25 and vector space, fusing the intelligence with Reciprocal Rank Fusion or Distribution-Based Score Fusion. It interrogates the results — an LLM quality gate cross-examines every document, ruthlessly discarding anything that doesn't hold up under scrutiny. And when the trail goes cold, it doesn't retreat. It rewrites the query, recruits a swarm of parallel strategies, and keeps going until the mission is complete.

Only once the evidence is airtight does it surface the answer. Cited. Grounded. Delivered.

"Shaken, not stirred — and always on target."

rag007 operates autonomously in the shadows of your vector store. No hardcoded pipelines. No fixed playbooks. Plug in any LLM, any backend, any reranker — and it adapts to the terrain.

Not in the name of any crown or government. In the name of whoever is seeking the truth in their data.

The intelligence service for your documents. Ready for deployment.

🕵️ How It Works

Most RAG libraries are pipelines — query in, documents out, done. rag007 is an agent.

Like a field operative, it doesn't execute a single search and report back. It thinks, adapts, and keeps going until the mission is complete:

🧠 Understands the intent — rewrites your query into precise search keywords, detects whether it's a keyword lookup or semantic question, and adjusts the hybrid search ratio accordingly
🔍 Searches intelligently — runs multiple query variants simultaneously across BM25 and vector search, fuses the results, and re-ranks with a dedicated reranker
🧐 Judges the results — an LLM quality gate evaluates whether the retrieved documents actually answer the question
🔄 Adapts autonomously — if results are off-target, rewrites the query and tries again; if a single approach fails, fans out into a swarm of parallel search strategies
✍️ Delivers the answer — only once it's confident the evidence is solid does it generate a cited, grounded response

This is the difference between a search box and a field agent.

✨ Features

🕵️ True agentic loop — retrieve → judge → rewrite → retry, fully autonomous
🔍 Hybrid search — BM25 + vector, fused with RRF or DBSF
🧠 HyDE — hypothetical document embeddings for better recall on vague queries
🌊 Swarm retrieval — fans out to parallel strategies when a single search fails
🛠️ Tool-calling agent — get_index_settings, get_filter_values, search_hybrid, search_bm25, rerank_results — LLM picks tools dynamically
🏆 Multi-reranker — Cohere, HuggingFace, Jina, ColBERT, RankGPT, or custom
🗄️ 8 backends — Meilisearch, Azure AI Search, ChromaDB, LanceDB, Qdrant, pgvector, DuckDB, InMemory
🤖 Any LLM — OpenAI, Azure, Anthropic, Ollama, Vertex AI, or any LangChain model
⚡ One-line init — init_agent("docs", model="openai:gpt-5.4", backend="qdrant") — no imports needed
💬 Multi-turn chat — conversation history with citation-aware answers
🎯 Auto-strategy — LLM samples your collection and tunes itself automatically
🔄 Async-native — every operation has a sync and async variant

📦 Install

pip install rag007

Optional backends & extras

pip install rag007[meilisearch]     # 🔎 Meilisearch
pip install rag007[azure]           # ☁️  Azure AI Search
pip install rag007[chromadb]        # 🟣 ChromaDB
pip install rag007[lancedb]         # 🏹 LanceDB
pip install rag007[pgvector]        # 🐘 PostgreSQL + pgvector
pip install rag007[qdrant]          # 🟡 Qdrant
pip install rag007[duckdb]          # 🦆 DuckDB
pip install rag007[cohere]          # 🏅 Cohere reranker
pip install rag007[huggingface]     # 🤗 HuggingFace cross-encoder (local)
pip install rag007[jina]            # 🌊 Jina reranker
pip install rag007[rerankers]       # 🎯 rerankers (ColBERT, Flashrank, RankGPT, …)
pip install rag007[cli]             # 🖥️  Interactive CLI
pip install rag007[all]             # 🍸 Everything, shaken not stirred

🚀 Quick Start

One-liner with `init_agent`

The fastest way to get started — no provider imports, string aliases for everything:

from rag007 import init_agent

# Minimal — in-memory backend, LLM from env vars
rag = init_agent("docs")

# OpenAI + Qdrant + Cohere reranker
rag = init_agent(
    "my-collection",
    model="openai:gpt-5.4",
    backend="qdrant",
    backend_url="http://localhost:6333",
    reranker="cohere",
)

# Anthropic + Azure AI Search (native vectorisation, no client-side embeddings)
rag = init_agent(
    "my-index",
    model="anthropic:claude-sonnet-4-6",
    gen_model="anthropic:claude-opus-4-6",
    backend="azure",
    backend_url="https://my-search.search.windows.net",
    reranker="huggingface",
    auto_strategy=True,
)

# Fully local — Ollama + ChromaDB + HuggingFace cross-encoder
rag = init_agent(
    "docs",
    model="ollama:llama3",
    backend="chroma",
    reranker="huggingface",
    reranker_model="cross-encoder/ms-marco-MiniLM-L-6-v2",
)

Backend aliases

Alias	Class	Extra
`"memory"` / `"in_memory"`	`InMemoryBackend`	(none)
`"chroma"` / `"chromadb"`	`ChromaDBBackend`	`rag007[chromadb]`
`"qdrant"`	`QdrantBackend`	`rag007[qdrant]`
`"lancedb"` / `"lance"`	`LanceDBBackend`	`rag007[lancedb]`
`"duckdb"`	`DuckDBBackend`	`rag007[duckdb]`
`"pgvector"` / `"pg"`	`PgvectorBackend`	`rag007[pgvector]`
`"meilisearch"`	`MeilisearchBackend`	`rag007[meilisearch]`
`"azure"`	`AzureAISearchBackend`	`rag007[azure]`

Reranker aliases

Alias	Class	`reranker_model`	Extra
`"cohere"`	`CohereReranker`	Cohere model name (default: `rerank-v3.5`)	`rag007[cohere]`
`"huggingface"` / `"hf"`	`HuggingFaceReranker`	HF model name (default: `cross-encoder/ms-marco-MiniLM-L-6-v2`)	`rag007[huggingface]`
`"jina"`	`JinaReranker`	Jina model name (default: `jina-reranker-v2-base-multilingual`)	`rag007[jina]`
`"llm"`	`LLMReranker`	(uses the agent's LLM)	(none)
`"rerankers"`	`RerankersReranker`	Any model from the `rerankers` library	`rag007[rerankers]`

# Cohere (default model)
rag = init_agent("docs", model="openai:gpt-5.4", reranker="cohere")

# HuggingFace — multilingual model
rag = init_agent("docs", model="openai:gpt-5.4", reranker="huggingface",
                 reranker_model="cross-encoder/mmarco-mMiniLMv2-L12-H384-v1")

# Jina
rag = init_agent("docs", model="openai:gpt-5.4", reranker="jina")  # uses JINA_API_KEY

# ColBERT via rerankers library
rag = init_agent("docs", model="openai:gpt-5.4", reranker="rerankers",
                 reranker_model="colbert-ir/colbertv2.0",
                 reranker_kwargs={"model_type": "colbert"})

# Pass a pre-built reranker instance directly
from rag007 import CohereReranker
rag = init_agent("docs", reranker=CohereReranker(model="rerank-v3.5", api_key="..."))

Model strings: any "provider:model-name" from LangChain's init_chat_model — openai, anthropic, azure_openai, google_vertexai, ollama, groq, mistralai, and more

Manual setup

from rag007 import Agent, InMemoryBackend

backend = InMemoryBackend(embed_fn=my_embed_fn)
backend.add_documents([
    {"content": "RAG combines retrieval with generation", "source": "wiki"},
    {"content": "Vector search finds similar embeddings", "source": "docs"},
])

rag = Agent(index="demo", backend=backend)

# Single query → full answer
state = rag.invoke("What is retrieval-augmented generation?")
print(state.answer)

# Retrieve only — documents without LLM answer
query, docs = rag.retrieve_documents("What is retrieval-augmented generation?")
for doc in docs:
    print(doc.page_content)

# Override top-K at call time
query, docs = rag.retrieve_documents("hybrid search", top_k=3)

`Agent.from_model` — model string with explicit backend

from rag007 import Agent, QdrantBackend

rag = Agent.from_model(
    "openai:gpt-5.4-mini",          # fast model for routing & rewriting
    index="docs",
    gen_model="openai:gpt-5.4",     # powerful model for the final answer
    backend=QdrantBackend("docs", url="http://localhost:6333"),
)

💬 Multi-turn Chat

from rag007 import Agent, ConversationTurn

rag = Agent(index="articles")
history: list[ConversationTurn] = []

state = rag.chat("What is hybrid search?", history)
history.append(ConversationTurn(question="What is hybrid search?", answer=state.answer))

state = rag.chat("How does it compare to pure vector search?", history)
print(state.answer)
print(f"Sources: {len(state.documents)}")

Async variant:

state = await rag.achat("What is hybrid search?", history)

🏗️ Architecture

rag007 has two operating modes — both fully autonomous:

Graph mode (`rag.chat` / `rag.invoke`)

The default. A LangGraph state machine that runs the full agentic pipeline:

Query
  │
  ├─[HyDE]──────────────────────────────────────────┐
  │  Hypothetical document embedding (parallel)      │
  │                                                  ▼
  ▼                                         [Embed HyDE text]
[Preprocess]                                         │
  Extract keywords + variants                        │
  Detect semantic_ratio + fusion strategy            │
  │                                                  │
  └──────────────────────────────────────────────────┘
                        │
                        ▼
              [Hybrid Search × N queries]
               BM25 + Vector, multi-arm
                        │
                        ▼
               [RRF / DBSF Fusion]
                        │
                        ▼
                    [Rerank]
               Cohere / HF / Jina / LLM
                        │
                        ▼
               [Quality Gate]
               LLM judges relevance
                   │         │
                (good)     (bad)
                   │         │
                   ▼         ▼
              [Generate]  [Rewrite] ──► loop (max_iter)
                   │
                   ▼
        Answer + [n] inline citations

Tool-calling agent mode (`rag.invoke_agent`)

The agent receives a set of tools and reasons step-by-step, calling them in whatever order makes sense for the question. No fixed pipeline — pure field improvisation:

Query
  │
  ▼
[LLM Agent]  ◄──────────────────────────────────────┐
  Thinks: "What do I need to answer this?"           │
  │                                                  │
  ├── get_index_settings()                           │
  │   Discover filterable / sortable / boost fields  │
  │                                                  │
  ├── get_filter_values(field)                       │
  │   Sample real stored values for a field          │
  │   → build precise filter expressions             │
  │                                                  │
  ├── search_hybrid(query, filter, sort_fields)      │
  │   BM25 + vector, optional filter + sort boost    │
  │                                                  │
  ├── search_bm25(query, filter)                     │
  │   Fallback pure keyword search                   │
  │                                                  │
  ├── rerank_results(query, hits)                    │
  │   Re-rank with configured reranker               │
  │                                                  │
  └── [needs more info?] ─────────────────────────► │

  [done]
  │
  ▼
Answer  (tool calls explained inline)

Use invoke_agent when questions involve dynamic filtering — the agent inspects the index schema, samples real field values, builds filters on the fly, and decides whether to sort by business signals like popularity or recency.

🗄️ Backends

☁️ Azure AI Search

Native hybrid search — no client-side embeddings needed when the index has an integrated vectorizer:

from rag007 import Agent, AzureAISearchBackend

# Native vectorization — service embeds the query server-side
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
    ),
)

# Client-side vectorization
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
        embed_fn=my_embed_fn,
    ),
)

# With Azure semantic reranking
rag = Agent(
    index="my-index",
    backend=AzureAISearchBackend(
        "my-index",
        endpoint="https://my-search.search.windows.net",
        api_key="...",
        semantic_config="my-semantic-config",
    ),
)

🟡 Qdrant

from rag007 import Agent, QdrantBackend

rag = Agent(
    index="my_collection",
    backend=QdrantBackend("my_collection", url="http://localhost:6333", embed_fn=my_embed_fn),
)

🟣 ChromaDB

from rag007 import Agent, ChromaDBBackend

rag = Agent(
    index="my_collection",
    backend=ChromaDBBackend("my_collection", path="./chroma_db", embed_fn=my_embed_fn),
)

🏹 LanceDB

from rag007 import Agent, LanceDBBackend

rag = Agent(
    index="docs",
    backend=LanceDBBackend("docs", db_uri="./lancedb", embed_fn=my_embed_fn),
)

🐘 PostgreSQL + pgvector

from rag007 import Agent, PgvectorBackend

rag = Agent(
    index="documents",
    backend=PgvectorBackend(
        "documents",
        dsn="postgresql://user:pass@localhost:5432/mydb",
        embed_fn=my_embed_fn,
    ),
)

🦆 DuckDB

from rag007 import Agent, DuckDBBackend

rag = Agent(
    index="vectors",
    backend=DuckDBBackend("vectors", db_path="./my.duckdb", embed_fn=my_embed_fn),
)

🔎 Meilisearch

from rag007 import Agent, MeilisearchBackend

rag = Agent(
    index="articles",
    backend=MeilisearchBackend("articles", url="http://localhost:7700", api_key="masterKey"),
)

📦 InMemory (default, zero dependencies)

from rag007 import Agent, InMemoryBackend

backend = InMemoryBackend(embed_fn=my_embed_fn)
backend.add_documents([
    {"content": "RAG combines retrieval with generation", "source": "wiki"},
    {"content": "Vector search finds similar embeddings", "source": "docs"},
])

rag = Agent(index="demo", backend=backend)

🤖 LLM Configuration

Pass a pre-built LangChain model or use init_agent / Agent.from_model for string-based init.
When using Agent directly, configure via env vars or pass an explicit model instance.

OpenAI

from langchain_openai import ChatOpenAI
from rag007 import Agent

rag = Agent(
    index="articles",
    llm=ChatOpenAI(model="gpt-5.4", api_key="sk-..."),
    gen_llm=ChatOpenAI(model="gpt-5.4", api_key="sk-..."),
)

Azure OpenAI (explicit keys)

from langchain_openai import AzureChatOpenAI
from rag007 import Agent

llm = AzureChatOpenAI(
    azure_endpoint="https://my-resource.openai.azure.com",
    azure_deployment="gpt-5.4",
    api_key="...",
    api_version="2024-12-01-preview",
)
rag = Agent(index="articles", llm=llm, gen_llm=llm)

Azure OpenAI (env vars)

# Set: AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT
from rag007 import Agent

rag = Agent(index="articles")  # auto-detected

Azure OpenAI with Managed Identity (no API key)

from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from langchain_openai import AzureChatOpenAI
from rag007 import Agent

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)
llm = AzureChatOpenAI(
    azure_endpoint="https://my-resource.openai.azure.com",
    azure_deployment="gpt-5.4",
    azure_ad_token_provider=token_provider,
    api_version="2024-12-01-preview",
)
rag = Agent(index="articles", llm=llm, gen_llm=llm)

Anthropic Claude

pip install langchain-anthropic

from langchain_anthropic import ChatAnthropic
from rag007 import Agent

llm = ChatAnthropic(model="claude-sonnet-4-6", api_key="sk-ant-...")
rag = Agent(index="articles", llm=llm, gen_llm=llm)

Ollama (local, no API key)

pip install langchain-ollama

from langchain_ollama import ChatOllama
from rag007 import Agent

rag = Agent(
    index="articles",
    llm=ChatOllama(model="llama3.2", base_url="http://localhost:11434"),
    gen_llm=ChatOllama(model="llama3.2", base_url="http://localhost:11434"),
)

Google Vertex AI

pip install langchain-google-vertexai

from langchain_google_vertexai import ChatVertexAI
from rag007 import Agent

llm = ChatVertexAI(model="gemini-2.0-flash", project="my-gcp-project", location="us-central1")
rag = Agent(index="articles", llm=llm, gen_llm=llm)

Separate fast and generation models

Use a cheap/fast model for query rewriting and routing, a powerful model for the final answer:

from langchain_openai import AzureChatOpenAI
from rag007 import Agent

fast_llm = AzureChatOpenAI(azure_deployment="gpt-5.4-mini", api_key="...", api_version="2024-12-01-preview")
gen_llm  = AzureChatOpenAI(azure_deployment="gpt-5.4",      api_key="...", api_version="2024-12-01-preview")

rag = Agent(index="articles", llm=fast_llm, gen_llm=gen_llm)

🏆 Rerankers

🏅 Cohere

from rag007 import Agent, CohereReranker

rag = Agent(index="articles", reranker=CohereReranker(model="rerank-v3.5", api_key="..."))

🤗 HuggingFace cross-encoder (local, no API key)

pip install rag007[huggingface]

from rag007 import Agent, HuggingFaceReranker

rag = Agent(index="articles", reranker=HuggingFaceReranker())

# Multilingual
rag = Agent(index="articles", reranker=HuggingFaceReranker(model="cross-encoder/mmarco-mMiniLMv2-L12-H384-v1"))

🌊 Jina (multilingual API)

pip install rag007[jina]

from rag007 import Agent, JinaReranker

rag = Agent(index="articles", reranker=JinaReranker(api_key="..."))  # or JINA_API_KEY env var

🎯 rerankers — ColBERT / Flashrank / RankGPT / any cross-encoder

Unified bridge to the rerankers library by answer.ai:

pip install rag007[rerankers]

from rag007 import Agent, RerankersReranker

rag = Agent(index="articles", reranker=RerankersReranker("cross-encoder/ms-marco-MiniLM-L-6-v2", model_type="cross-encoder"))
rag = Agent(index="articles", reranker=RerankersReranker("colbert-ir/colbertv2.0", model_type="colbert"))
rag = Agent(index="articles", reranker=RerankersReranker("flashrank", model_type="flashrank"))
rag = Agent(index="articles", reranker=RerankersReranker("gpt-5.4-mini", model_type="rankgpt", api_key="..."))

🔧 Custom reranker

from rag007 import Agent, RerankResult

class MyReranker:
    def rerank(self, query: str, documents: list[str], top_n: int) -> list[RerankResult]:
        return [RerankResult(index=i, relevance_score=1.0 / (i + 1)) for i in range(top_n)]

rag = Agent(index="articles", reranker=MyReranker())

🛠️ Tools

When using invoke_agent, the LLM has access to a set of tools it can call in any order. No fixed pipeline — the agent decides what it needs.

Tool	Description
`get_index_settings()`	Discover filterable, searchable, sortable, and boost fields from the index schema
`get_filter_values(field)`	Sample real stored values for a field — used to build precise filter expressions
`search_hybrid(query, filter_expr, semantic_ratio, sort_fields)`	BM25 + vector hybrid search with optional filter and sort boost
`search_bm25(query, filter_expr)`	Pure keyword search — fallback when hybrid returns poor results
`rerank_results(query, hits)`	Re-rank a list of hits with the configured reranker

The agent follows this reasoning pattern:

Call get_index_settings() to learn the schema
If the question names a specific entity, call get_filter_values(field) to find the exact stored value
Call search_hybrid() with a filter and/or sort if relevant, otherwise broad hybrid search
Fall back to search_bm25() if results are thin
Call rerank_results() to surface the most relevant hits
Summarise — explaining which filters and signals influenced the answer

from rag007 import Agent

rag = Agent(index="products")

# Agent inspects schema, detects brand field, samples values,
# builds filter, sorts by popularity signal — all autonomously
result = rag.invoke_agent("Show me the most popular Bosch power tools")
print(result)

⚙️ Constructor Reference

Agent(
    index="my_index",           # collection / index name
    backend=...,                # SearchBackend (default: InMemoryBackend)
    llm=...,                    # fast LLM — routing, rewrite, filter
    gen_llm=...,                # generation LLM — final answer
    reranker=...,               # Cohere / HuggingFace / Jina / custom
    top_k=10,                   # final result count            [RAG_TOP_K]
    rerank_top_n=5,             # reranker top-n                [RAG_RERANK_TOP_N]
    retrieval_factor=4,         # over-retrieval multiplier     [RAG_RETRIEVAL_FACTOR]
    max_iter=20,                # max retrieve-rewrite cycles   [RAG_MAX_ITER]
    semantic_ratio=0.5,         # hybrid semantic weight        [RAG_SEMANTIC_RATIO]
    fusion="rrf",               # "rrf" or "dbsf"               [RAG_FUSION]
    instructions="",            # extra system prompt for generation
    embed_fn=None,              # (str) -> list[float]
    boost_fn=None,              # (doc_dict) -> float score boost
    base_filter=None,           # always-on filter expression
    hyde_min_words=8,           # min words to trigger HyDE     [RAG_HYDE_MIN_WORDS]
    hyde_style_hint="",         # style hint for HyDE prompt
    auto_strategy=False,        # auto-tune from index samples
)

📡 API Reference

Method	Returns	Description
`rag.invoke(query)`	`RAGState`	Full RAG pipeline (sync)
`rag.ainvoke(query)`	`RAGState`	Full RAG pipeline (async)
`rag.chat(query, history)`	`RAGState`	Multi-turn chat (sync)
`rag.achat(query, history)`	`RAGState`	Multi-turn chat (async)
`rag.retrieve_documents(query, top_k)`	`(str, list[Document])`	Retrieve only, no answer
`rag.query(query)`	`str`	Answer string directly
`rag.invoke_agent(query)`	`str`	Tool-calling agent mode (sync)
`rag.ainvoke_agent(query)`	`str`	Tool-calling agent mode (async)

RAGState fields: answer · documents · query · question · history · iterations

🌍 Environment Variables

Variable	Description	Default
`AZURE_OPENAI_ENDPOINT`	Azure OpenAI endpoint	—
`AZURE_OPENAI_API_KEY`	Azure OpenAI API key	—
`AZURE_OPENAI_DEPLOYMENT`	Default deployment	—
`AZURE_OPENAI_FAST_DEPLOYMENT`	Fast model deployment	→ `DEPLOYMENT`
`AZURE_OPENAI_GENERATION_DEPLOYMENT`	Generation deployment	→ `DEPLOYMENT`
`AZURE_OPENAI_API_VERSION`	API version	`2024-12-01-preview`
`OPENAI_API_KEY`	OpenAI API key (fallback)	—
`OPENAI_MODEL`	OpenAI model name	`gpt-5.4`
`AZURE_COHERE_ENDPOINT`	Azure Cohere endpoint	—
`AZURE_COHERE_API_KEY`	Azure Cohere API key	—
`COHERE_API_KEY`	Cohere API key (fallback)	—
`JINA_API_KEY`	Jina reranker API key	—
`MEILI_URL`	Meilisearch URL	`http://localhost:7700`
`MEILI_KEY`	Meilisearch API key	`masterKey`
`RAG_TOP_K`	Final result count	`10`
`RAG_RERANK_TOP_N`	Reranker top-n	`5`
`RAG_RETRIEVAL_FACTOR`	Over-retrieval multiplier	`4`
`RAG_SEMANTIC_RATIO`	Hybrid semantic weight	`0.5`
`RAG_FUSION`	Fusion strategy	`rrf`
`RAG_HYDE_MIN_WORDS`	Min words to trigger HyDE	`8`

🖥️ CLI

"The gadgets are ready."

pip install rag007[cli]

# 💬 Chat mode — full agentic pipeline
rag007 --chat --collection my_index

# 🔍 Retriever mode — documents only, no LLM
rag007 --retriever --collection my_index

# 🎯 Retrieve top-K only
rag007 --retriever --collection my_index --top-k 5
rag007 --retriever -c my_index -k 5

📄 License

MIT — Licence to code.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dominikpeter

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

Apr 15, 2026

0.2.2

Apr 13, 2026

0.2.1

Apr 13, 2026

0.2.0

Apr 13, 2026

This version

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag007-0.1.0.tar.gz (388.0 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rag007-0.1.0-py3-none-any.whl (45.8 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file rag007-0.1.0.tar.gz.

File metadata

Download URL: rag007-0.1.0.tar.gz
Upload date: Apr 13, 2026
Size: 388.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rag007-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a8c2ef29fd31cfcf81fe4b564f9e0b3f31055fc21ae6dbeec193956a262ebeb6`
MD5	`1263e818f83d706e451214faf47ef1dd`
BLAKE2b-256	`c6c5d9777cd480deded31620d39d388b50173f730c2b066b7382be4b8ddf06f6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rag007-0.1.0.tar.gz:

Publisher: workflow.yml on bmsuisse/rag007

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rag007-0.1.0.tar.gz
- Subject digest: a8c2ef29fd31cfcf81fe4b564f9e0b3f31055fc21ae6dbeec193956a262ebeb6
- Sigstore transparency entry: 1287605547
- Sigstore integration time: Apr 13, 2026
Source repository:
- Permalink: bmsuisse/rag007@32703551a9a9fe7f5bad536bfbe937b0f5ed6374
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/bmsuisse
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@32703551a9a9fe7f5bad536bfbe937b0f5ed6374
- Trigger Event: push

File details

Details for the file rag007-0.1.0-py3-none-any.whl.

File metadata

Download URL: rag007-0.1.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 45.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rag007-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8141b647181ee554b08559439ac15af71d8ed2fb671134d8e309552f707a53a1`
MD5	`0ca749dbc8d3f881362ad111085c4ac8`
BLAKE2b-256	`7c3183ea8c5dddf7f2bc7c0e34f1ca9cba104babda885d72c3d53c1dbff52e63`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rag007-0.1.0-py3-none-any.whl:

Publisher: workflow.yml on bmsuisse/rag007

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rag007-0.1.0-py3-none-any.whl
- Subject digest: 8141b647181ee554b08559439ac15af71d8ed2fb671134d8e309552f707a53a1
- Sigstore transparency entry: 1287605574
- Sigstore integration time: Apr 13, 2026
Source repository:
- Permalink: bmsuisse/rag007@32703551a9a9fe7f5bad536bfbe937b0f5ed6374
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/bmsuisse
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@32703551a9a9fe7f5bad536bfbe937b0f5ed6374
- Trigger Event: push

rag007 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🔫 rag007 — Licensed to Retrieve

🕵️ The Story

🕵️ How It Works

✨ Features

📦 Install

🚀 Quick Start

One-liner with init_agent

Manual setup

Agent.from_model — model string with explicit backend

💬 Multi-turn Chat

🏗️ Architecture

Graph mode (rag.chat / rag.invoke)

Tool-calling agent mode (rag.invoke_agent)

🗄️ Backends

☁️ Azure AI Search

🟡 Qdrant

🟣 ChromaDB

🏹 LanceDB

🐘 PostgreSQL + pgvector

🦆 DuckDB

🔎 Meilisearch

📦 InMemory (default, zero dependencies)

🤖 LLM Configuration

OpenAI

Azure OpenAI (explicit keys)

Azure OpenAI (env vars)

Azure OpenAI with Managed Identity (no API key)

Anthropic Claude

Ollama (local, no API key)

Google Vertex AI

Separate fast and generation models

🏆 Rerankers

🏅 Cohere

🤗 HuggingFace cross-encoder (local, no API key)

🌊 Jina (multilingual API)

🎯 rerankers — ColBERT / Flashrank / RankGPT / any cross-encoder

🔧 Custom reranker

🛠️ Tools

⚙️ Constructor Reference

📡 API Reference

🌍 Environment Variables

🖥️ CLI

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

One-liner with `init_agent`

`Agent.from_model` — model string with explicit backend

Graph mode (`rag.chat` / `rag.invoke`)

Tool-calling agent mode (`rag.invoke_agent`)