The best RAG interface yet - Built-in vector store, embeddings, citations, and auto-updates
Project description
piragi
The best RAG interface yet.
from piragi import Ragi
kb = Ragi(["./docs", "s3://bucket/data/**/*.pdf", "https://api.example.com/docs"])
answer = kb.ask("How do I deploy this?")
Built-in vector store, embeddings, citations, and auto-updates. Free & local by default.
Installation
pip install piragi
# Optional: Install Ollama for local LLM
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2
# Optional extras
pip install piragi[s3] # S3 support
pip install piragi[gcs] # Google Cloud Storage
pip install piragi[azure] # Azure Blob Storage
pip install piragi[crawler] # Recursive web crawling
pip install piragi[graph] # Knowledge graph
pip install piragi[postgres] # PostgreSQL/pgvector
pip install piragi[pinecone] # Pinecone
pip install piragi[supabase] # Supabase
pip install piragi[all] # Everything
Features
- Zero Config - Works with free local models out of the box
- All Formats - PDF, Word, Excel, Markdown, Code, URLs, Images, Audio
- Remote Storage - Read from S3, GCS, Azure, HDFS, SFTP with glob patterns
- Web Crawling - Recursively crawl websites with
/**syntax - Auto-Updates - Background refresh, queries never blocked
- Smart Citations - Every answer includes sources
- Pluggable Stores - LanceDB, PostgreSQL, Pinecone, Supabase, or custom
- Advanced Retrieval - HyDE, hybrid search, cross-encoder reranking
- Semantic Chunking - Context-aware and hierarchical chunking
- Knowledge Graph - Entity/relationship extraction for better answers
- Async Support - Non-blocking API for web frameworks
Quick Start
from piragi import Ragi
# Local files
kb = Ragi("./docs")
# Multiple sources with globs
kb = Ragi(["./docs/*.pdf", "https://api.docs.com", "./code/**/*.py"])
# Remote filesystems
kb = Ragi("s3://bucket/docs/**/*.pdf")
kb = Ragi("gs://bucket/reports/*.md")
# Ask questions
answer = kb.ask("What is the API rate limit?")
print(answer.text)
# View citations
for cite in answer.citations:
print(f"{cite.source}: {cite.score:.0%}")
Remote Filesystems
Read files from cloud storage using glob patterns:
# S3
kb = Ragi("s3://my-bucket/docs/**/*.pdf")
# Google Cloud Storage
kb = Ragi("gs://my-bucket/reports/*.md")
# Azure Blob Storage
kb = Ragi("az://my-container/files/*.txt")
# Mix local and remote
kb = Ragi([
"./local-docs",
"s3://bucket/remote-docs/**/*.pdf",
"https://example.com/api-docs"
])
Requires optional extras: pip install piragi[s3], piragi[gcs], or piragi[azure]
Web Crawling
Recursively crawl websites using /** suffix:
# Crawl entire site
kb = Ragi("https://docs.example.com/**")
# Crawl specific section
kb = Ragi("https://docs.example.com/api/**")
# Mix with other sources
kb = Ragi([
"./local-docs",
"https://docs.example.com/**",
"s3://bucket/data/*.pdf"
])
Crawls same-domain links up to depth 3, max 100 pages by default.
Requires: pip install piragi[crawler]
Vector Store Backends
from piragi import Ragi
from piragi.stores import PineconeStore, SupabaseStore
# LanceDB (default) - local or S3-backed
kb = Ragi("./docs")
kb = Ragi("./docs", store="s3://bucket/indices")
# PostgreSQL with pgvector
kb = Ragi("./docs", store="postgres://user:pass@localhost/db")
# Pinecone
kb = Ragi("./docs", store=PineconeStore(api_key="...", index_name="my-index"))
# Supabase
kb = Ragi("./docs", store=SupabaseStore(url="https://xxx.supabase.co", key="..."))
Advanced Retrieval
kb = Ragi("./docs", config={
"retrieval": {
"use_hyde": True, # Hypothetical document embeddings
"use_hybrid_search": True, # BM25 + vector search
"use_cross_encoder": True, # Neural reranking
}
})
Chunking Strategies
# Semantic - splits at topic boundaries
kb = Ragi("./docs", config={"chunk": {"strategy": "semantic"}})
# Hierarchical - parent-child for context + precision
kb = Ragi("./docs", config={"chunk": {"strategy": "hierarchical"}})
# Contextual - LLM-generated context per chunk
kb = Ragi("./docs", config={"chunk": {"strategy": "contextual"}})
Knowledge Graph
Extract entities and relationships for better multi-hop reasoning:
# Enable with single flag
kb = Ragi("./docs", graph=True)
# Automatic - extracts entities/relationships during ingestion
# Uses them to augment retrieval for relationship questions
answer = kb.ask("Who reports to Alice?")
# Direct graph access
kb.graph.entities() # ["alice", "bob", "project x"]
kb.graph.neighbors("alice") # ["bob", "engineering team"]
kb.graph.triples() # [("alice", "manages", "bob"), ...]
Requires: pip install piragi[graph]
Configuration
config = {
"llm": {
"model": "llama3.2",
"base_url": "http://localhost:11434/v1"
},
"embedding": {
"model": "all-mpnet-base-v2",
"batch_size": 32
},
"chunk": {
"strategy": "fixed",
"size": 512,
"overlap": 50
},
"retrieval": {
"use_hyde": False,
"use_hybrid_search": False,
"use_cross_encoder": False
},
"auto_update": {
"enabled": True,
"interval": 300
}
}
Async Support
Use AsyncRagi for non-blocking operations in async web frameworks:
from piragi import AsyncRagi
kb = AsyncRagi("./docs")
# Simple await
await kb.add("./more-docs")
answer = await kb.ask("What is X?")
# With progress tracking
async for progress in kb.add("./large-docs", progress=True):
print(progress)
# "Discovering files..."
# "Found 10 documents"
# "Chunking 1/10: doc1.md"
# ...
# "Generating embeddings for 150 chunks..."
# "Embedded 32/150 chunks"
# "Embedded 64/150 chunks"
# ...
# "Embeddings complete"
# "Done"
# With FastAPI
@app.post("/ingest")
async def ingest(files: list[str]):
await kb.add(files)
return {"status": "done"}
All methods are async: add(), ask(), retrieve(), refresh(), count(), clear().
Retrieval Only
Use piragi as a retrieval layer without LLM:
chunks = kb.retrieve("How does auth work?", top_k=5)
for chunk in chunks:
print(chunk.chunk, chunk.source, chunk.score)
# Use with your own LLM
context = "\n".join(c.chunk for c in chunks)
response = your_llm(f"Context:\n{context}\n\nQuestion: {query}")
API
# Sync API
kb = Ragi(sources, persist_dir=".piragi", config=None, store=None, graph=False)
kb.add("./more-docs")
kb.ask(query, top_k=5)
kb.retrieve(query, top_k=5)
kb.filter(**metadata).ask(query)
kb.refresh("./docs")
kb.count()
kb.clear()
# Async API (same methods, just await them)
kb = AsyncRagi(sources, persist_dir=".piragi", config=None, store=None, graph=False)
await kb.add("./more-docs")
await kb.ask(query, top_k=5)
Full docs: API.md
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file piragi-0.7.8.tar.gz.
File metadata
- Download URL: piragi-0.7.8.tar.gz
- Upload date:
- Size: 462.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6773a46597fb6a0cbaa43682a8df92ae26ba42f95e4f1edc462ca039509ed989
|
|
| MD5 |
38a224843373fefd268e7dc4e6844fc5
|
|
| BLAKE2b-256 |
ad8571814eafc49d824f86e089f79e6b076afad65b8654cc2647ecd996e6e546
|
File details
Details for the file piragi-0.7.8-py3-none-any.whl.
File metadata
- Download URL: piragi-0.7.8-py3-none-any.whl
- Upload date:
- Size: 62.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4d752200d7a611de8921c01dadc88465929b3bc2ea1a3cc808d7e962411759a
|
|
| MD5 |
a75424f2f4a0a22d966e6338a3119115
|
|
| BLAKE2b-256 |
5c83aa594065839ee3f5898e1e26443ddc9a3010d9fffcb70bd8b8fb0ec073b4
|