Skip to main content

RAGWire — Production-grade RAG toolkit for document ingestion and retrieval with hybrid search support

Project description

RAGWire logo

RAGWire

Production-grade RAG toolkit for document ingestion and retrieval

PyPI License YouTube


Features

  • Document Loading — PDF, DOCX, XLSX, PPTX and more via MarkItDown
  • LLM Metadata Extraction — extracts company, doc type, fiscal period using your LLM
  • Smart Text Splitting — markdown-aware and recursive chunking strategies
  • Multiple Embedding Providers — Ollama, OpenAI, HuggingFace, Google, FastEmbed
  • Qdrant Vector Store — dense, sparse, and hybrid search
  • Advanced Retrieval — similarity, MMR, and hybrid search
  • SHA256 Deduplication — at both file and chunk level

Installation

pip install ragwire

# With Ollama support (local, no API key)
pip install "ragwire[ollama]"

# With all providers
pip install "ragwire[all]"

Quick Start

from ragwire import RAGPipeline

pipeline = RAGPipeline("config.yaml")

# Ingest documents
stats = pipeline.ingest_documents(["data/Apple_10k_2025.pdf"])
print(f"Chunks created: {stats['chunks_created']}")

# Retrieve
results = pipeline.retrieve("What is Apple's total revenue?", top_k=5)
for doc in results:
    print(doc.metadata.get("company_name"), doc.page_content[:200])

Configuration

Copy config.example.yaml to config.yaml and edit:

embeddings:
  provider: "ollama"
  model: "qwen3-embedding:0.6b"
  base_url: "http://localhost:11434"

llm:
  provider: "ollama"
  model: "qwen3.5:9b"
  temperature: 0.0
  num_ctx: 16384

vectorstore:
  url: "http://localhost:6333"
  collection_name: "my_docs"
  use_sparse: true

retriever:
  search_type: "hybrid"
  top_k: 5

Embedding Providers

# Ollama (local)
embeddings:
  provider: "ollama"
  model: "qwen3-embedding:0.6b"

# OpenAI
embeddings:
  provider: "openai"
  model: "text-embedding-3-small"

# HuggingFace (local)
embeddings:
  provider: "huggingface"
  model_name: "sentence-transformers/all-MiniLM-L6-v2"

# Google
embeddings:
  provider: "google"
  model: "models/embedding-001"

Component Usage

from ragwire import (
    MarkItDownLoader,
    get_splitter,
    get_markdown_splitter,
    get_embedding,
    QdrantStore,
    MetadataExtractor,
    hybrid_search,
    mmr_search,
)

# Load a document
loader = MarkItDownLoader()
result = loader.load("document.pdf")

# Split text
splitter = get_markdown_splitter(chunk_size=10000, chunk_overlap=2000)
chunks = splitter.split_text(result["text_content"])

# Embeddings
embedding = get_embedding({"provider": "ollama", "model": "qwen3-embedding:0.6b"})

# Vector store
store = QdrantStore(config={"url": "http://localhost:6333"}, embedding=embedding)
store.set_collection("my_collection")
vectorstore = store.get_store()

Architecture

ragwire/
├── core/          # Config loader + RAGPipeline orchestrator
├── loaders/       # MarkItDown document converter
├── processing/    # Text splitters + SHA256 hashing
├── metadata/      # Pydantic schema + LLM extractor
├── embeddings/    # Multi-provider embedding factory
├── vectorstores/  # Qdrant wrapper with hybrid search
├── retriever/     # Similarity, MMR, hybrid retrieval
└── utils/         # Logging

Troubleshooting

Error Fix
Qdrant connection refused docker run -p 6333:6333 qdrant/qdrant
markitdown[pdf] missing pip install "markitdown[pdf]"
Ollama model not found ollama pull <model-name>
fastembed missing pip install fastembed (needed for hybrid search)
Embedding dimension mismatch Set force_recreate: true in config once, then back to false

License

MIT © 2026 KGP Talkie Private Limited

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragwire-1.0.0.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragwire-1.0.0-py3-none-any.whl (3.6 MB view details)

Uploaded Python 3

File details

Details for the file ragwire-1.0.0.tar.gz.

File metadata

  • Download URL: ragwire-1.0.0.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragwire-1.0.0.tar.gz
Algorithm Hash digest
SHA256 24a742e57bceed0816a4b6884c438567ad907f8e10d0003204c4d97a1f550080
MD5 e80e925ae3d51f40a41223d4ca22fa9c
BLAKE2b-256 1382f743e2c8f58bee88154d98f3c0d9335695404ad7f6eecbc2ec22437d77cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragwire-1.0.0.tar.gz:

Publisher: publish.yml on laxmimerit/RAGWire

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ragwire-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ragwire-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragwire-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 18d80d576396ebdf7a9c8389e0af5ad33629520e8b2d89dfc62052107f88daba
MD5 603a8eee7d886e05734aba31905b0d20
BLAKE2b-256 2491e7900364de3a90e64b34f4a9dd32ed28f6ad61a2026fede0f949b54c6adf

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragwire-1.0.0-py3-none-any.whl:

Publisher: publish.yml on laxmimerit/RAGWire

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page