RAGWire — Production-grade RAG toolkit for document ingestion and retrieval with hybrid search support

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

KGPTalkie

These details have not been verified by PyPI

Project links

YouTube

Project description

RAGWire logo

RAGWire

Production-grade RAG toolkit for document ingestion and retrieval

Features

Document Loading — PDF, DOCX, XLSX, PPTX and more via MarkItDown
LLM Metadata Extraction — extracts company, doc type, fiscal period using your LLM; fully customisable via YAML
Smart Text Splitting — markdown-aware and recursive chunking strategies
Multiple Embedding Providers — Ollama, OpenAI, HuggingFace, Google, FastEmbed
Qdrant Vector Store — dense, sparse, and hybrid search
Advanced Retrieval — similarity, MMR, and hybrid search with metadata filtering
SHA256 Deduplication — at both file and chunk level
Directory Ingestion — ingest an entire folder with one call, with optional recursive scan
Env Var Substitution — use ${VAR} in config.yaml for secrets

Architecture

RAGWire Architecture

Installation

pip install ragwire

# With Ollama support (local, no API key)
pip install "ragwire[ollama]"

# With all providers
pip install "ragwire[all]"

Quick Start

from ragwire import RAGWire

rag = RAGWire("config.yaml")

# Ingest files — SHA256 deduplication, safe to re-run
stats = rag.ingest_documents(["data/Apple_10k_2025.pdf", "data/Microsoft_10k_2025.pdf"])
print(f"Processed: {stats['processed']}, Skipped: {stats['skipped']}, Chunks: {stats['chunks_created']}")

# Or ingest an entire directory
stats = rag.ingest_directory("data/", recursive=True)

# Basic retrieval — returns list of LangChain Document objects
results = rag.retrieve("What is the total revenue?", top_k=5)
for doc in results:
    print(doc.page_content[:300])
    print(doc.metadata["company_name"])   # str, lowercased — e.g. "apple"
    print(doc.metadata["fiscal_year"])    # list[int] — e.g. [2025]  ← NOT a plain int
    print(doc.metadata["file_name"])      # str — e.g. "Apple_10k_2025.pdf"

# Retrieval with explicit metadata filters
results = rag.retrieve(
    "What is the net income?",
    filters={"company_name": "apple", "fiscal_year": 2025}  # pass year as int
)

# OR logic within a field — matches any of the listed values
results = rag.retrieve("Compare revenue trends", filters={"fiscal_year": [2023, 2024, 2025]})

# Agent-controlled filtering (recommended for AI agents)
filters = rag.extract_filters("Apple's revenue in 2025")
# → {"company_name": "apple", "fiscal_year": 2025} or None
results = rag.retrieve("Apple's revenue in 2025", filters=filters)

Configuration

Copy config.example.yaml to config.yaml and edit. Secrets can be injected via environment variables:

vectorstore:
  url: "https://your-cluster.qdrant.io"
  api_key: "${QDRANT_API_KEY}"

llm:
  provider: "openai"
  model: "gpt-5.4-nano"
  api_key: "${OPENAI_API_KEY}"

Full example:

embeddings:
  provider: "ollama"
  model: "qwen3-embedding:0.6b"
  base_url: "http://localhost:11434"

llm:
  provider: "ollama"
  model: "qwen3.5:9b"
  num_ctx: 16384

vectorstore:
  url: "http://localhost:6333"
  collection_name: "my_docs"
  use_sparse: true

retriever:
  search_type: "hybrid"
  top_k: 5
  auto_filter: false   # set true to enable LLM-based filter extraction from every query

Embedding Providers

# Ollama (local)
embeddings:
  provider: "ollama"
  model: "qwen3-embedding:0.6b"

# OpenAI
embeddings:
  provider: "openai"
  model: "text-embedding-3-small"

# HuggingFace (local)
embeddings:
  provider: "huggingface"
  model_name: "sentence-transformers/all-MiniLM-L6-v2"

# Google
embeddings:
  provider: "google"
  model: "models/embedding-001"

Component Usage

from ragwire import (
    MarkItDownLoader,
    get_splitter,
    get_markdown_splitter,
    get_embedding,
    QdrantStore,
    MetadataExtractor,
    hybrid_search,
    mmr_search,
)

# Load a document
loader = MarkItDownLoader()
result = loader.load("document.pdf")

# Split text
splitter = get_markdown_splitter(chunk_size=10000, chunk_overlap=2000)
chunks = splitter.split_text(result["text_content"])

# Embeddings
embedding = get_embedding({"provider": "ollama", "model": "qwen3-embedding:0.6b"})

# Vector store
store = QdrantStore(config={"url": "http://localhost:6333"}, embedding=embedding)
store.set_collection("my_collection")
vectorstore = store.get_store()

Architecture

ragwire/
├── core/          # Config loader + RAGWire orchestrator
├── loaders/       # MarkItDown document converter
├── processing/    # Text splitters + SHA256 hashing
├── metadata/      # Pydantic schema + LLM extractor
├── embeddings/    # Multi-provider embedding factory
├── vectorstores/  # Qdrant wrapper with hybrid search
├── retriever/     # Similarity, MMR, hybrid retrieval
└── utils/         # Logging

Troubleshooting

Error	Fix
Qdrant connection refused	`docker run -p 6333:6333 qdrant/qdrant`
`markitdown[pdf]` missing	`pip install "markitdown[pdf]"`
Ollama model not found	`ollama pull <model-name>`
`fastembed` missing	`pip install fastembed` (needed for hybrid search)
Embedding dimension mismatch	Set `force_recreate: true` in config once, then back to `false`

License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

KGPTalkie

These details have not been verified by PyPI

Project links

YouTube

Release history Release notifications | RSS feed

This version

1.2.9

Apr 19, 2026

1.2.8

Mar 27, 2026

1.2.7

Mar 25, 2026

1.2.1

Mar 24, 2026

1.2.0

Mar 24, 2026

1.1.9

Mar 24, 2026

1.1.8

Mar 24, 2026

1.1.7

Mar 23, 2026

1.1.5

Mar 23, 2026

1.1.4

Mar 23, 2026

1.1.3

Mar 23, 2026

1.1.1

Mar 22, 2026

1.0.1

Mar 22, 2026

1.0.0

Mar 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragwire-1.2.9.tar.gz (3.6 MB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragwire-1.2.9-py3-none-any.whl (3.6 MB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file ragwire-1.2.9.tar.gz.

File metadata

Download URL: ragwire-1.2.9.tar.gz
Upload date: Apr 19, 2026
Size: 3.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragwire-1.2.9.tar.gz
Algorithm	Hash digest
SHA256	`31d1bccbcfb616244a5f5740e0fc75e60789b6d7f88a2081b7eb845c698125b8`
MD5	`6818a916ef6ffd02634c0c558bcf9fad`
BLAKE2b-256	`f14605eff4c911bf1e15bf3ff196349cbd53aca9e6fa3dd5383313da928aa6d8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragwire-1.2.9.tar.gz:

Publisher: publish.yml on laxmimerit/RAGWire

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragwire-1.2.9.tar.gz
- Subject digest: 31d1bccbcfb616244a5f5740e0fc75e60789b6d7f88a2081b7eb845c698125b8
- Sigstore transparency entry: 1340706489
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: laxmimerit/RAGWire@f0c915a4bfe6e2c04c033bea7926a72f156f2789
- Branch / Tag: refs/tags/v1.2.9
- Owner: https://github.com/laxmimerit
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f0c915a4bfe6e2c04c033bea7926a72f156f2789
- Trigger Event: release

File details

Details for the file ragwire-1.2.9-py3-none-any.whl.

File metadata

Download URL: ragwire-1.2.9-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 3.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragwire-1.2.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c26bd857dcb5c084c2c4088855e1eb5b13b34f6d345b3ba0b298e41bec3052df`
MD5	`e1380aca4f3777cfce4e56dc4a6e0180`
BLAKE2b-256	`530ac0a6d92176f57bce2a85834488fc7cfbde7f3ddc67a042e199956d8f78c4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragwire-1.2.9-py3-none-any.whl:

Publisher: publish.yml on laxmimerit/RAGWire

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragwire-1.2.9-py3-none-any.whl
- Subject digest: c26bd857dcb5c084c2c4088855e1eb5b13b34f6d345b3ba0b298e41bec3052df
- Sigstore transparency entry: 1340706491
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: laxmimerit/RAGWire@f0c915a4bfe6e2c04c033bea7926a72f156f2789
- Branch / Tag: refs/tags/v1.2.9
- Owner: https://github.com/laxmimerit
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f0c915a4bfe6e2c04c033bea7926a72f156f2789
- Trigger Event: release

ragwire 1.2.9

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RAGWire

Features

Architecture

Installation

Quick Start

Configuration

Embedding Providers

Component Usage

Architecture

Troubleshooting

License

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance