Production-grade RAG in 4 lines — hybrid search, streaming, and agent tools on by default.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

developewithlax

These details have not been verified by PyPI

Project links

Documentation

Project description

ragwise

The retrieval layer your agents need — hybrid BM25+dense search, retrieval observability, agent tools, and temporal filtering on by default. pip install. No Docker.

Docs · Changelog · PyPI · Discussions

ragwise demo

Install

pip install ragwise

Quickstart

from ragwise import RAG, QueryConfig

async with RAG(llm="openai/gpt-4o-mini", reranker="flashrank") as rag:
    result = await rag.ingest("./docs/")
    print(result)  # IngestResult(chunks_created=42, skipped=0, failed_files=[])

    answer = await rag.query("What is the refund policy?")
    print(answer.text)
    print(answer.citations[0].text)    # passage text
    print(answer.citations[0].source)  # "docs/refund-policy.md"
    print(answer.trace.retrieval_ms)   # 34
    print(answer.trace.cost_usd)       # 0.00021

Hybrid search — BM25 + dense retrieval fused with RRF, answer with citations

How it Works

A two-phase pipeline — ingest once, query with hybrid search every time. BM25 and dense retrieval run in parallel and are fused with RRF, scoring 18% higher NDCG than dense-only.

How ragwise works — ingest pipeline and hybrid query pipeline

	Dense-only	BM25-only	Hybrid (ragwise)
NDCG score	0.72	0.65	0.85

Why ragwise?

Feature	ragwise	LangChain	LlamaIndex	RAGFlow
Lines to get started	4	40+	20+	Docker setup
Hybrid search by default	✅	❌	opt-in	✅ (Docker)
pip install, no server	✅	✅	✅	❌
Async-first	✅	partial	partial	❌
Streaming	✅	partial	partial	❌
Retrieval trace (always-on)	✅	❌	❌	❌
Passage-level citations	✅	❌	partial	❌
Temporal filtering (`as_of`)	✅	❌	❌	❌
Agent tool built-in	✅	❌	❌	❌
Multi-tenant isolation	✅	❌	❌	❌
Built-in eval	✅	❌	partial	❌

Observability

Every query populates answer.trace — no setup, no extra code. Debug bad retrieval in seconds.

answer = await rag.query("What is the refund policy?")

# Timing and cost
print(answer.trace.retrieval_ms)   # 34
print(answer.trace.generation_ms)  # 812
print(answer.trace.cost_usd)       # 0.00021

# Per-chunk scores
for chunk in answer.trace.retrieved_chunks:
    print(chunk.source, chunk.bm25_score, chunk.dense_score, chunk.rrf_score)

# Cache hit?
print(answer.trace.cache_hit)       # True / False
print(answer.trace.query_variants)  # ["What is...", "Explain the refund..."]

Passage-Level Citations

Citations include the actual passage text, page number, and confidence score — not just filenames.

for c in answer.citations:
    print(c.source)    # "docs/refund-policy.md"
    print(c.text)      # "Refunds are processed within 5 business days..."
    print(c.score)     # 0.91
    print(c.page)      # 3
    c.explain()        # prints human-readable ranking explanation

Confidence Gating

Stop hallucinations before they happen. When retrieval is too weak, ragwise returns a structured "no answer" instead of calling the LLM.

async with RAG(llm="openai/gpt-4o-mini", confidence_threshold=0.7) as rag:
    answer = await rag.query("...")
    if not answer.has_sufficient_context:
        print("Not enough evidence — answer withheld")
    else:
        print(answer.text)

Document Management

Full index lifecycle — delete, list, and update documents across all backends. Required for GDPR right-to-erasure.

# Remove a document and all its chunks
await rag.delete(source="docs/old-policy.md")

# List all indexed sources
sources = await rag.list_sources()
# [SourceInfo(source="docs/policy.md", chunk_count=12, last_updated=...)]

# Re-ingest a changed file (stale chunks auto-deleted before upsert)
await rag.update(source="docs/policy.md", path="./docs/policy.md")

Temporal Filtering

Filter your index by document validity date — no competitor has this. Useful for policies, regulations, and versioned docs.

# Ingest with validity window
await rag.ingest(
    "./docs/",
    metadata={"valid_from": "2024-01-01", "valid_until": "2024-12-31"},
)

# Query as of a specific date — expired chunks are automatically excluded
answer = await rag.query(
    "What is the refund policy?",
    config=QueryConfig(as_of="2024-06-15"),
)

# Find stale documents
stale = await rag.list_stale(older_than_days=90)
for doc in stale:
    print(doc.source, doc.last_updated)

Semantic Cache

Reduce LLM API cost by 50–80%. Similar queries hit the cache even if the wording differs — smarter than SHA-256 exact match.

async with RAG(
    llm="openai/gpt-4o-mini",
    cache=True,
    cache_threshold=0.92,   # cosine similarity threshold
) as rag:
    answer1 = await rag.query("What is the refund policy?")
    answer2 = await rag.query("How do refunds work?")  # cache hit
    print(answer2.trace.cache_hit)  # True — returned in <10ms

Set RAGWISE_CACHE_REDIS_URL for a Redis-backed cache shared across processes.

Query Expansion (RAG-Fusion)

Generate N query variants automatically, retrieve for each, and fuse with RRF. Higher recall, especially for ambiguous questions.

answer = await rag.query(
    "How are refunds processed?",
    config=QueryConfig(n_queries=3),
)
print(answer.trace.query_variants)
# ["How are refunds processed?", "What is the refund timeline?", "Explain the returns policy"]

Agent Tools

Wire your entire document index into any Claude or OpenAI agent — stateful across calls, with loop detection and context budget tracking.

from ragwise.agent import as_claude_tool_suite, AgentSession

# Single-turn tool
tool = as_claude_tool(rag)

# Multi-turn stateful session (deduplicates chunks, detects loops)
session = AgentSession(rag)
tools = as_claude_tool_suite(rag, max_iterations=5)
# Returns: search_documents, get_document_context, check_context_budget

response = anthropic.messages.create(
    model="claude-opus-4-6",
    tools=tools,
    messages=[{"role": "user", "content": question}],
)

Agent tools — ready-made Claude and OpenAI tool schemas

Streaming

Tokens stream as they're generated. Works with OpenAI, Anthropic, and Ollama — same two lines regardless of provider.

async for token in rag.stream_query("What changed in v3.2?"):
    print(token, end="", flush=True)

Streaming — tokens arrive as they're generated

FastAPI Integration

Production-ready HTTP pattern — lifespan management and dependency injection built in.

from ragwise.fastapi import RAGLifespan, get_rag, stream_response
from fastapi import FastAPI, Depends

app = FastAPI(lifespan=RAGLifespan(llm="openai/gpt-4o-mini"))

@app.get("/query")
async def query(q: str, rag=Depends(get_rag)):
    answer = await rag.query(q)
    return {"text": answer.text, "citations": [c.source for c in answer.citations]}

@app.get("/stream")
async def stream(q: str, rag=Depends(get_rag)):
    return stream_response(rag.stream_query(q))

Testing

Deterministic, CI-free tests with VCR cassettes and a fake embedder. No API calls in CI.

from ragwise.testing import cassette, FakeEmbedder, assert_retrieval

# Record once, replay in CI — zero API calls
with cassette("tests/cassettes/refund.yaml"):
    answer = await rag.query("What is the refund policy?")
    assert_retrieval(answer, must_include_source="docs/refund-policy.md")

# Fully deterministic embedder for unit tests
rag = RAG(embedder=FakeEmbedder(dim=384), llm="openai/gpt-4o-mini")

pip install ragwise[testing]   # auto-registers pytest plugin: fake_rag, recorded_rag fixtures

Multi-Tenant Isolation

Tag documents at ingest, filter at query time. No store schema changes needed — works with all three backends.

await rag.ingest("./org_a_docs/", tenant_id="org_a")
await rag.ingest("./org_b_docs/", tenant_id="org_b")

answer = await rag.query(
    "What is our data retention policy?",
    config=QueryConfig(tenant_id="org_a"),
)

Multi-tenant isolation — scoped retrieval per tenant

Store Options

Same API from local dev to production. Change one string — nothing else.

RAG(store="memory")                      # dev — zero setup, volatile
RAG(store="lance://./ragwise-index")     # dev — persistent, no server
RAG(store="postgresql://user:pw@db/x")  # production — pgvector

memory  →  lance://  →  postgresql://
  ↑            ↑               ↑
 tests      staging        production

Configuration

Full typed config with Pydantic — typos caught at construction, not at first query.

from ragwise import RAG, RAGConfig, LLMConfig, QueryConfig

config = RAGConfig.from_env()   # reads RAGWISE_LLM_MODEL, RAGWISE_STORE_BACKEND

async with RAG(
    embedder="openai/text-embedding-3-small",
    store="lance://./my-index",
    llm="openai/gpt-4o-mini",
    reranker="flashrank",          # local, no GPU — or "cohere/rerank-4"
    chunk_size=512,
    chunk_overlap=64,
    cache=True,
    cache_threshold=0.92,
    confidence_threshold=0.7,
) as rag:
    result = await rag.ingest("./docs/", glob="**/*.md")
    answer = await rag.query(
        "What changed in v3.2?",
        config=QueryConfig(top_k=5, n_queries=3, as_of="2024-06-15"),
    )

CLI

ragwise init           # generate ragwise_config.py with defaults
ragwise serve          # start HTTP API on localhost:8000
ragwise serve --port 9000
ragwise doctor         # health check: credentials, store, hybrid search, latency

ragwise doctor runs in under 10 seconds and prints a checkmark for each component — useful after first install or a dependency upgrade.

Optional Extras

pip install ragwise[lance]       # LanceDB persistent store
pip install ragwise[postgres]    # PostgreSQL + pgvector
pip install ragwise[local-emb]   # sentence-transformers embedder + reranker
pip install ragwise[testing]     # VCR cassettes, FakeEmbedder, pytest plugin
pip install ragwise[eval]        # RAGAS + Langfuse eval loop
pip install ragwise[serve]       # ragwise serve HTTP API

Who It's For

✓ Python developers who want production-ready RAG as a library, not a platform.
✓ AI engineers building agents — wire your doc index into Claude or GPT in one line.
✓ Teams already on PostgreSQL — zero new infrastructure with store="postgresql://...".
✓ Anyone who values typed, async-first, minimal-dependency code.

✗ Not for you if you need a no-code UI, knowledge graphs, or agent orchestration — use RAGFlow or LangGraph instead.

Roadmap

v0.2.0 ships all of the above — typed config, document management, retrieval observability, passage citations, confidence gating, reranking, agent sessions, VCR-based testing, FastAPI integration, temporal filtering, semantic cache, query expansion, and document TTL.

What's next is driven by real usage — follow GitHub Discussions to vote.

Community

Questions & help → GitHub Discussions
Bug reports → GitHub Issues
Contributing → CONTRIBUTING.md

License

MIT — see LICENSE

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

developewithlax

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.2.0

Apr 18, 2026

0.1.0

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragwise-0.2.0.tar.gz (7.5 MB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragwise-0.2.0-py3-none-any.whl (63.2 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file ragwise-0.2.0.tar.gz.

File metadata

Download URL: ragwise-0.2.0.tar.gz
Upload date: Apr 18, 2026
Size: 7.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragwise-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`abb324d29e3bf3a9e2a5c3e786447eb495a5be1560c1c4333bcb41c37f988231`
MD5	`7a873e41d3f7d156e5fdbbb4b9f61bae`
BLAKE2b-256	`f04562444265cb07e68e8f087ade5ba8ee498f8c15b0523da67bf386e5ba824f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragwise-0.2.0.tar.gz:

Publisher: release.yml on laxmikanta415/ragwise

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragwise-0.2.0.tar.gz
- Subject digest: abb324d29e3bf3a9e2a5c3e786447eb495a5be1560c1c4333bcb41c37f988231
- Sigstore transparency entry: 1334044019
- Sigstore integration time: Apr 18, 2026
Source repository:
- Permalink: laxmikanta415/ragwise@4f5aeb982a863a22893e72ff7d039bd17c65fac2
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/laxmikanta415
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@4f5aeb982a863a22893e72ff7d039bd17c65fac2
- Trigger Event: push

File details

Details for the file ragwise-0.2.0-py3-none-any.whl.

File metadata

Download URL: ragwise-0.2.0-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 63.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragwise-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0f716bb3e5fe748731e53f910eaf64335583e47f0d68fb0aa2d00062e470e104`
MD5	`caa8f7b5dd80be2f1cad91de02c17ed8`
BLAKE2b-256	`a9fbf4b2b06dd770fbc68d89b1b42fd5626ef186b6abc8057ef1eceff8e119a1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragwise-0.2.0-py3-none-any.whl:

Publisher: release.yml on laxmikanta415/ragwise

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragwise-0.2.0-py3-none-any.whl
- Subject digest: 0f716bb3e5fe748731e53f910eaf64335583e47f0d68fb0aa2d00062e470e104
- Sigstore transparency entry: 1334044121
- Sigstore integration time: Apr 18, 2026
Source repository:
- Permalink: laxmikanta415/ragwise@4f5aeb982a863a22893e72ff7d039bd17c65fac2
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/laxmikanta415
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@4f5aeb982a863a22893e72ff7d039bd17c65fac2
- Trigger Event: push

ragwise 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ragwise

Install

Quickstart

How it Works

Why ragwise?

Observability

Passage-Level Citations

Confidence Gating

Document Management

Temporal Filtering

Semantic Cache

Query Expansion (RAG-Fusion)

Agent Tools

Streaming

FastAPI Integration

Testing

Multi-Tenant Isolation

Store Options

Configuration

CLI

Optional Extras

Who It's For

Roadmap

Community

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance