Skip to main content

Production-grade RAG in 4 lines — hybrid search, streaming, and agent tools on by default.

Project description

ragwise

CI PyPI Downloads Python 3.11+ License: MIT Docs Code style: ruff

The retrieval layer your agents need — hybrid BM25+dense search, retrieval observability, agent tools, and temporal filtering on by default. pip install. No Docker.

Docs · Changelog · PyPI · Discussions

ragwise demo


Install

pip install ragwise

Quickstart

from ragwise import RAG, QueryConfig

async with RAG(llm="openai/gpt-4o-mini", reranker="flashrank") as rag:
    result = await rag.ingest("./docs/")
    print(result)  # IngestResult(chunks_created=42, skipped=0, failed_files=[])

    answer = await rag.query("What is the refund policy?")
    print(answer.text)
    print(answer.citations[0].text)    # passage text
    print(answer.citations[0].source)  # "docs/refund-policy.md"
    print(answer.trace.retrieval_ms)   # 34
    print(answer.trace.cost_usd)       # 0.00021

Hybrid search — BM25 + dense retrieval fused with RRF, answer with citations


How it Works

A two-phase pipeline — ingest once, query with hybrid search every time. BM25 and dense retrieval run in parallel and are fused with RRF, scoring 18% higher NDCG than dense-only.

How ragwise works — ingest pipeline and hybrid query pipeline

Dense-only BM25-only Hybrid (ragwise)
NDCG score 0.72 0.65 0.85

Why ragwise?

Feature ragwise LangChain LlamaIndex RAGFlow
Lines to get started 4 40+ 20+ Docker setup
Hybrid search by default opt-in ✅ (Docker)
pip install, no server
Async-first partial partial
Streaming partial partial
Retrieval trace (always-on)
Passage-level citations partial
Temporal filtering (as_of)
Agent tool built-in
Multi-tenant isolation
Built-in eval partial

Observability

Every query populates answer.trace — no setup, no extra code. Debug bad retrieval in seconds.

answer = await rag.query("What is the refund policy?")

# Timing and cost
print(answer.trace.retrieval_ms)   # 34
print(answer.trace.generation_ms)  # 812
print(answer.trace.cost_usd)       # 0.00021

# Per-chunk scores
for chunk in answer.trace.retrieved_chunks:
    print(chunk.source, chunk.bm25_score, chunk.dense_score, chunk.rrf_score)

# Cache hit?
print(answer.trace.cache_hit)       # True / False
print(answer.trace.query_variants)  # ["What is...", "Explain the refund..."]

Passage-Level Citations

Citations include the actual passage text, page number, and confidence score — not just filenames.

for c in answer.citations:
    print(c.source)    # "docs/refund-policy.md"
    print(c.text)      # "Refunds are processed within 5 business days..."
    print(c.score)     # 0.91
    print(c.page)      # 3
    c.explain()        # prints human-readable ranking explanation

Confidence Gating

Stop hallucinations before they happen. When retrieval is too weak, ragwise returns a structured "no answer" instead of calling the LLM.

async with RAG(llm="openai/gpt-4o-mini", confidence_threshold=0.7) as rag:
    answer = await rag.query("...")
    if not answer.has_sufficient_context:
        print("Not enough evidence — answer withheld")
    else:
        print(answer.text)

Document Management

Full index lifecycle — delete, list, and update documents across all backends. Required for GDPR right-to-erasure.

# Remove a document and all its chunks
await rag.delete(source="docs/old-policy.md")

# List all indexed sources
sources = await rag.list_sources()
# [SourceInfo(source="docs/policy.md", chunk_count=12, last_updated=...)]

# Re-ingest a changed file (stale chunks auto-deleted before upsert)
await rag.update(source="docs/policy.md", path="./docs/policy.md")

Temporal Filtering

Filter your index by document validity date — no competitor has this. Useful for policies, regulations, and versioned docs.

# Ingest with validity window
await rag.ingest(
    "./docs/",
    metadata={"valid_from": "2024-01-01", "valid_until": "2024-12-31"},
)

# Query as of a specific date — expired chunks are automatically excluded
answer = await rag.query(
    "What is the refund policy?",
    config=QueryConfig(as_of="2024-06-15"),
)

# Find stale documents
stale = await rag.list_stale(older_than_days=90)
for doc in stale:
    print(doc.source, doc.last_updated)

Semantic Cache

Reduce LLM API cost by 50–80%. Similar queries hit the cache even if the wording differs — smarter than SHA-256 exact match.

async with RAG(
    llm="openai/gpt-4o-mini",
    cache=True,
    cache_threshold=0.92,   # cosine similarity threshold
) as rag:
    answer1 = await rag.query("What is the refund policy?")
    answer2 = await rag.query("How do refunds work?")  # cache hit
    print(answer2.trace.cache_hit)  # True — returned in <10ms

Set RAGWISE_CACHE_REDIS_URL for a Redis-backed cache shared across processes.


Query Expansion (RAG-Fusion)

Generate N query variants automatically, retrieve for each, and fuse with RRF. Higher recall, especially for ambiguous questions.

answer = await rag.query(
    "How are refunds processed?",
    config=QueryConfig(n_queries=3),
)
print(answer.trace.query_variants)
# ["How are refunds processed?", "What is the refund timeline?", "Explain the returns policy"]

Agent Tools

Wire your entire document index into any Claude or OpenAI agent — stateful across calls, with loop detection and context budget tracking.

from ragwise.agent import as_claude_tool_suite, AgentSession

# Single-turn tool
tool = as_claude_tool(rag)

# Multi-turn stateful session (deduplicates chunks, detects loops)
session = AgentSession(rag)
tools = as_claude_tool_suite(rag, max_iterations=5)
# Returns: search_documents, get_document_context, check_context_budget

response = anthropic.messages.create(
    model="claude-opus-4-6",
    tools=tools,
    messages=[{"role": "user", "content": question}],
)

Agent tools — ready-made Claude and OpenAI tool schemas


Streaming

Tokens stream as they're generated. Works with OpenAI, Anthropic, and Ollama — same two lines regardless of provider.

async for token in rag.stream_query("What changed in v3.2?"):
    print(token, end="", flush=True)

Streaming — tokens arrive as they're generated


FastAPI Integration

Production-ready HTTP pattern — lifespan management and dependency injection built in.

from ragwise.fastapi import RAGLifespan, get_rag, stream_response
from fastapi import FastAPI, Depends

app = FastAPI(lifespan=RAGLifespan(llm="openai/gpt-4o-mini"))

@app.get("/query")
async def query(q: str, rag=Depends(get_rag)):
    answer = await rag.query(q)
    return {"text": answer.text, "citations": [c.source for c in answer.citations]}

@app.get("/stream")
async def stream(q: str, rag=Depends(get_rag)):
    return stream_response(rag.stream_query(q))

Testing

Deterministic, CI-free tests with VCR cassettes and a fake embedder. No API calls in CI.

from ragwise.testing import cassette, FakeEmbedder, assert_retrieval

# Record once, replay in CI — zero API calls
with cassette("tests/cassettes/refund.yaml"):
    answer = await rag.query("What is the refund policy?")
    assert_retrieval(answer, must_include_source="docs/refund-policy.md")

# Fully deterministic embedder for unit tests
rag = RAG(embedder=FakeEmbedder(dim=384), llm="openai/gpt-4o-mini")
pip install ragwise[testing]   # auto-registers pytest plugin: fake_rag, recorded_rag fixtures

Multi-Tenant Isolation

Tag documents at ingest, filter at query time. No store schema changes needed — works with all three backends.

await rag.ingest("./org_a_docs/", tenant_id="org_a")
await rag.ingest("./org_b_docs/", tenant_id="org_b")

answer = await rag.query(
    "What is our data retention policy?",
    config=QueryConfig(tenant_id="org_a"),
)

Multi-tenant isolation — scoped retrieval per tenant


Store Options

Same API from local dev to production. Change one string — nothing else.

RAG(store="memory")                      # dev — zero setup, volatile
RAG(store="lance://./ragwise-index")     # dev — persistent, no server
RAG(store="postgresql://user:pw@db/x")  # production — pgvector
memory  →  lance://  →  postgresql://
  ↑            ↑               ↑
 tests      staging        production

Configuration

Full typed config with Pydantic — typos caught at construction, not at first query.

from ragwise import RAG, RAGConfig, LLMConfig, QueryConfig

config = RAGConfig.from_env()   # reads RAGWISE_LLM_MODEL, RAGWISE_STORE_BACKEND

async with RAG(
    embedder="openai/text-embedding-3-small",
    store="lance://./my-index",
    llm="openai/gpt-4o-mini",
    reranker="flashrank",          # local, no GPU — or "cohere/rerank-4"
    chunk_size=512,
    chunk_overlap=64,
    cache=True,
    cache_threshold=0.92,
    confidence_threshold=0.7,
) as rag:
    result = await rag.ingest("./docs/", glob="**/*.md")
    answer = await rag.query(
        "What changed in v3.2?",
        config=QueryConfig(top_k=5, n_queries=3, as_of="2024-06-15"),
    )

CLI

ragwise init           # generate ragwise_config.py with defaults
ragwise serve          # start HTTP API on localhost:8000
ragwise serve --port 9000
ragwise doctor         # health check: credentials, store, hybrid search, latency

ragwise doctor runs in under 10 seconds and prints a checkmark for each component — useful after first install or a dependency upgrade.


Optional Extras

pip install ragwise[lance]       # LanceDB persistent store
pip install ragwise[postgres]    # PostgreSQL + pgvector
pip install ragwise[local-emb]   # sentence-transformers embedder + reranker
pip install ragwise[testing]     # VCR cassettes, FakeEmbedder, pytest plugin
pip install ragwise[eval]        # RAGAS + Langfuse eval loop
pip install ragwise[serve]       # ragwise serve HTTP API

Who It's For

✓ Python developers who want production-ready RAG as a library, not a platform.
✓ AI engineers building agents — wire your doc index into Claude or GPT in one line.
✓ Teams already on PostgreSQL — zero new infrastructure with store="postgresql://...".
✓ Anyone who values typed, async-first, minimal-dependency code.

✗ Not for you if you need a no-code UI, knowledge graphs, or agent orchestration — use RAGFlow or LangGraph instead.


Roadmap

v0.2.0 ships all of the above — typed config, document management, retrieval observability, passage citations, confidence gating, reranking, agent sessions, VCR-based testing, FastAPI integration, temporal filtering, semantic cache, query expansion, and document TTL.

What's next is driven by real usage — follow GitHub Discussions to vote.

Community

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragwise-0.2.0.tar.gz (7.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragwise-0.2.0-py3-none-any.whl (63.2 kB view details)

Uploaded Python 3

File details

Details for the file ragwise-0.2.0.tar.gz.

File metadata

  • Download URL: ragwise-0.2.0.tar.gz
  • Upload date:
  • Size: 7.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragwise-0.2.0.tar.gz
Algorithm Hash digest
SHA256 abb324d29e3bf3a9e2a5c3e786447eb495a5be1560c1c4333bcb41c37f988231
MD5 7a873e41d3f7d156e5fdbbb4b9f61bae
BLAKE2b-256 f04562444265cb07e68e8f087ade5ba8ee498f8c15b0523da67bf386e5ba824f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragwise-0.2.0.tar.gz:

Publisher: release.yml on laxmikanta415/ragwise

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ragwise-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ragwise-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 63.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragwise-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f716bb3e5fe748731e53f910eaf64335583e47f0d68fb0aa2d00062e470e104
MD5 caa8f7b5dd80be2f1cad91de02c17ed8
BLAKE2b-256 a9fbf4b2b06dd770fbc68d89b1b42fd5626ef186b6abc8057ef1eceff8e119a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragwise-0.2.0-py3-none-any.whl:

Publisher: release.yml on laxmikanta415/ragwise

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page