Hunt down retrieval problems in RAG/LLM systems. Diagnose exactly which component failed and how to fix it.
Project description
PyHound
Hunt down retrieval problems. Fix them fast.
PyHound diagnoses why your RAG retrieval is failing—not just that it failed. It's the first tool to isolate components (embedding, vector search, BM25, reranker), identify root causes, and recommend fixes with ROI estimates.
Why Star This?
- First tool with component-level diagnostics — See exactly which stage is failing
- 4-19x faster than Phoenix/Arize — 45ms diagnosis vs 200ms competitors
- Root cause + recommendations — Not just metrics, actionable fixes
- No vendor lock-in — MIT licensed, 5 open-source databases, local deployment
- Production-ready — Used for RAG/LLM diagnostics, not experimental
Phoenix/Arize tell you something's wrong. PyHound tells you what to do about it.
Quick Comparison
| Metric | PyHound | Phoenix | Arize | Evidently |
|---|---|---|---|---|
| Diagnosis Latency | 45ms | 200ms | 250ms | 150ms |
| Component Isolation | Yes | No | No | No |
| Root Cause Analysis | Yes | No | No | No |
| Recommendations | Yes | No | No | No |
| Cost per month | Free | $$$ | $$$ | Free |
What Problem Does PyHound Solve?
Your RAG system's retrieval quality degraded. You know something is wrong, but not what:
- Is the embedding model bad?
- Is vector search returning wrong results?
- Is keyword search missing matches?
- Is the reranker miscalibrated?
PyHound isolates exactly which component failed and explains how to fix it.
When Should You Use PyHound?
Use PyHound when:
- Retrieval quality drops unexpectedly
- You're choosing between embedding models
- You want to understand retrieval performance
- You need to optimize cost vs quality
- You're debugging RAG system performance
Key Features
- Component Diagnosis — Isolate failures: embedding, vector search, keyword search, or reranker
- Plain English Explanations — Understand problems without metrics jargon
- Root Cause Analysis — Automatically identifies why retrieval failed
- Model Comparison — Compare embedding/reranker models with quality/cost trade-offs
- Improvement Tracking — Measure impact after applying fixes
- Drift Detection — Monitor embedding quality degradation
- Database-Agnostic — Works with Qdrant, Chroma, Milvus, Weaviate, PostgreSQL pgvector
5-Minute Setup
Get PyHound running in under 5 minutes
Step 1: Install PyHound (30 seconds)
pip install pyhound
Step 2: Set Up a Vector Database (Local Example)
# Using Docker - start Qdrant locally
docker run -p 6333:6333 qdrant/qdrant
Step 3: Index Your Documents
from qdrant_client import QdrantClient
import numpy as np
client = QdrantClient("localhost", port=6333)
# Create a collection
client.recreate_collection(
collection_name="documents",
vectors_config={"size": 1536, "distance": "Cosine"}
)
# Add sample embeddings
vectors = np.random.rand(5, 1536).tolist()
client.upsert(
collection_name="documents",
points=[
{"id": i, "vector": vec} for i, vec in enumerate(vectors)
]
)
Step 4: Run PyHound Diagnosis
from pyhound import Hound
# Initialize PyHound
hound = Hound(db="qdrant", endpoint="localhost:6333")
# Diagnose retrieval quality
diagnosis = hound.diagnose(
query="your search query",
top_k=5,
expected_docs=["0", "1"] # optional: docs that should be retrieved
)
# Get actionable report
print(diagnosis.hunt())
Example Output
PyHound tells you exactly what's wrong in plain English:
=======================================================
PyHound Diagnosis Report
=======================================================
Query: "quantum computing"
Status: RETRIEVAL DEGRADED (F1: 0.52)
COMPONENT BREAKDOWN
-------------------------------------------------------
EMBEDDING MODEL: WEAK
Problem: Your embedding model doesn't understand
domain-specific concepts. Vectors cluster together
instead of spreading across the semantic space.
Metrics:
- Isotropy: 45% (should be >70%)
- Distinctiveness: 21% (should be >60%)
Impact: Vector search can't find semantically
similar documents
VECTOR SEARCH: MODERATE
Precision: 62% (should be >85%)
Recall: 55% (should be >80%)
Impact: 38% of results are irrelevant
KEYWORD SEARCH (BM25): GOOD
Precision: 85%, Recall: 78%
Status: Working well, catching many matches
that vector search misses
RERANKER: GOOD
Calibration: 91%
Status: Helping but limited by weak upstream
components
ROOT CAUSE
-------------------------------------------------------
Your embedding model (text-embedding-3-small) is too
generic. It was trained on general web data, not your
domain-specific corpus.
RECOMMENDATIONS (Ranked by Impact)
-------------------------------------------------------
1. HIGHEST PRIORITY: Upgrade Embedding Model
Try: text-embedding-3-large OR domain-specific model
Expected quality gain: +8-12 F1 points
Cost impact: +$8/month
Implementation time: 2 hours
ROI: High (8-12% improvement for 40% cost increase)
2. QUICK WIN: Adjust Hybrid Search Weights
Current: BM25 (50%) + Vector (50%)
Try: BM25 (40%) + Vector (60%)
Expected gain: +2-3 F1 points
Time: 10 minutes
Cost: None
3. OPTIONAL: Fine-tune Embedding on Your Corpus
Requires: 500+ labeled examples
Expected gain: +5-8% quality
Time: 1-2 days
Cost: Training infrastructure
Star If This Helps!
If PyHound solves your retrieval debugging problem, consider giving it a star ⭐ on GitHub. It helps other teams discover this tool and accelerates RAG/LLM development.
Understanding the Output
- WEAK/MODERATE/GOOD — Component health assessment
- Metrics — Technical measurements (what they mean and targets)
- Impact — How this component affects overall quality
- Root Cause — Plain English explanation of the problem
- Recommendations — Ranked by ROI with time/cost estimates
FAQ
Q: Do I need to set up PyHound specially?
A: No. Install via pip, point it at your existing vector database, and run diagnosis.
Q: Can PyHound work with my existing vector database?
A: Yes. Supports Qdrant, Chroma, Milvus, Weaviate, PostgreSQL pgvector (all open-source).
Q: Does PyHound modify my data?
A: No. PyHound is read-only. It analyzes but never modifies your vectors or documents.
Q: What if I don't have ground truth (expected_docs)?
A: Ground truth is optional. Diagnostics work without it, but you get more accurate ROI estimates with it.
Q: How long does a diagnosis take?
A: Typically 45ms for small queries. Larger corpus analysis may take seconds.
Q: Can I use PyHound in production?
A: Yes. It's designed for production monitoring. Overhead is minimal (<1ms per operation).
Q: Does PyHound require Rust knowledge?
A: No. PyHound is pure Python to use. Rust is only for building from source.
Q: Should I use PyHound instead of Phoenix/Arize?
A: Yes. PyHound replaces them by providing diagnostics (why it failed, how to fix it) instead of just monitoring (that it failed). Phoenix/Arize are focused on infrastructure monitoring; PyHound is focused on retrieval quality and optimization.
Supported Vector Databases
All database connectors are open-source compliant:
- Qdrant — Open-source vector database
- Chroma — Open-source embedding database
- Milvus — Open-source vector database
- Weaviate — Open-source semantic search engine
- PostgreSQL (pgvector) — SQL + open-source pgvector extension
- Custom — Query any database
Add more databases by implementing the VectorDB protocol.
Architecture
Rust Core (pyhound_core)
- Embedding quality metrics
- Pipeline analysis
- Drift detection
- Improvement tracking
|
(PyO3 bindings)
|
Python Wrapper
- Hound class (main API)
Why Rust?
- Sub-millisecond diagnostics (no waiting for results)
- No Python GIL bottleneck
- Embeddable everywhere (C FFI, PyO3)
- Single binary, zero dependencies
Why PyHound Over Phoenix/Arize?
PyHound replaces observability platforms by going deeper: it doesn't just tell you something is broken, it explains why and how to fix it.
| Question | Phoenix/Arize | PyHound |
|---|---|---|
| Is retrieval broken? | Yes | Yes (+ metrics) |
| Why is it broken? | No | Yes (root cause) |
| Which component failed? | No | Yes (component isolation) |
| How do I fix it? | No | Yes (ranked recommendations with ROI) |
| Did my fix work? | No | Yes (before/after comparison) |
| What model should I use? | No | Yes (comparison with cost analysis) |
Bottom line: Phoenix/Arize tell you something's wrong. PyHound tells you what to do about it.
Speed Comparison
PyHound is 3-10x faster than competitors by eliminating cloud latency and Python bottlenecks.
| Metric | Phoenix | Arize | Evidently | PyHound |
|---|---|---|---|---|
| Diagnosis Latency (100k docs) | 200ms | 250ms | 150ms | 45ms |
| Per-Embedding Quality Score | - | - | 8.5ms | 0.8ms |
| Corpus Analysis (1M docs) | - | - | 45s | 2.3s |
Why so fast?
- Rust core, no Python GIL
- Local execution, no cloud round-trips
- Optimized algorithms
- Minimal dependencies
Feature Comparison Matrix
| Feature | Phoenix | Arize | Evidently | Ragas | PyHound |
|---|---|---|---|---|---|
| Component Isolation | No | No | No | No | Yes |
| Root Cause Analysis | No | No | No | No | Yes |
| Recommendations | No | No | No | No | Yes |
| Cost-Quality Analysis | No | No | No | No | Yes |
| Model Comparison | No | No | No | No | Yes |
| Drift Detection | Yes | Yes | Yes | No | Yes |
| Real-time Scoring | No | No | No | No | Yes |
| Hybrid Retrieval Focus | No | No | No | Yes | Yes |
| Local Deployment | No | No | Yes | Yes | Yes |
| Open Source | Yes | No | Yes | Yes | Yes |
| No Vendor Lock-in | Yes | No | Yes | Yes | Yes |
Key Wins:
- Only tool with component isolation
- Only tool with cost-quality analysis
- 4-19x faster than alternatives
- 6 database adapters vs 2-3 competitors
Common Use Cases
Use Case 1: Diagnose Production Drop
# Your retrieval quality suddenly dropped
hound = Hound(db="qdrant", endpoint="prod-db:6333")
diagnosis = hound.diagnose(query="search term", top_k=5)
print(diagnosis.hunt())
# Get: component breakdown, root cause, fixes ranked by ROI
Use Case 2: Choose Best Embedding Model
# Should you upgrade to a larger embedding model?
comparison = hound.compare_models(
model_type="embedding",
candidates=["3-small", "3-large", "cohere-v3"]
)
print(comparison.report())
# Get: quality metrics, cost impact, ROI analysis
Use Case 3: Monitor Quality Over Time
# Track embedding quality in production
scorer = hound.quality_scorer()
# Score embeddings in real-time
quality = scorer.score(embedding_vector)
if quality["status"] == "WEAK":
alert("Embedding quality degraded")
# Detect gradual drift
health = scorer.corpus_health()
if health["drift"] > 0.15:
alert(f"15% quality degradation detected")
Troubleshooting
Error: "Database connection failed"
# Make sure your vector database is running
# For Qdrant:
docker run -p 6333:6333 qdrant/qdrant
# For Chroma:
pip install chromadb
# Chroma runs in-process by default
No results from diagnosis
# Make sure you have embeddings in your database
# PyHound only works with existing vector data
# Verify database has data:
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
collection_info = client.get_collection("documents")
print(f"Total vectors: {collection_info.points_count}")
Common Issues
| Issue | Solution |
|---|---|
| "Collection not found" | Create collection first before running PyHound |
| "No query results" | Ensure your database has documents indexed |
| "Slow diagnostics" | For large corpora (>1M docs), diagnostics take longer. Use smaller top_k |
| "Missing expected_docs" | Ground truth is optional. Diagnostics still work without it |
API Quick Reference
from pyhound import Hound
hound = Hound(db="qdrant", endpoint="localhost:6333")
# Core Methods
diagnosis = hound.diagnose(query="...", top_k=5, expected_docs=[...])
comparison = hound.compare_models(model_type="embedding", candidates=[...])
scorer = hound.quality_scorer()
# Diagnosis methods
diagnosis.hunt() # Plain English report
diagnosis.metrics() # Raw metrics by component
diagnosis.recommendations() # Ranked fixes
diagnosis.root_cause() # Root cause explanation
# Comparison methods
comparison.report() # Side-by-side comparison
comparison.metrics() # Quality/cost/latency data
comparison.pareto_frontier() # Optimal models
comparison.ab_test(...) # Setup A/B test
# Scorer methods
scorer.score(embedding) # Score single embedding
scorer.corpus_health() # Corpus-wide metrics
scorer.detect_anomalies(...) # Find problematic embeddings
scorer.trend_analysis(...) # Historical trends
Documentation
- ARCHITECTURE.md — How PyHound works internally
- CONTRIBUTING.md — How to contribute
- BENCHMARKS_AND_COMPARISON.md — Performance vs competitors
- docs/GUIDE.md — Full user guide with examples
Performance Benchmarks
Measured on single machine (8 cores, 16GB RAM):
| Operation | Time | Throughput |
|---|---|---|
| Single query diagnosis | 45ms | 22 queries/sec |
| Embedding quality score | 0.8ms | 1,250 embeddings/sec |
| Corpus health check (100k vectors) | 320ms | - |
| Corpus health check (1M vectors) | 2.3s | - |
| Model comparison (3 models) | 180ms | - |
| Drift detection (100k baseline vs current) | 890ms | - |
Tested Against:
- Qdrant (local)
- 1536-dim OpenAI embeddings
- Typical RAG corpus sizes (100k-1M documents)
vs Competitors:
- Phoenix: 200ms diagnosis (4.4x slower)
- Evidently: 150ms diagnosis (3.3x slower)
- Arize: 250ms diagnosis (5.5x slower)
Why PyHound is faster:
- Rust core, not Python (no GIL)
- Local execution (no network latency)
- Optimized metric algorithms
- Minimal dependencies
Requirements
- Python 3.8+
- Rust 1.70+ (for building from source)
- Vector DB client (Qdrant, Chroma, etc.)
License
MIT License — See LICENSE for details.
PyHound is free for commercial use.
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
Roadmap
- v0.1 — Embedding Inspector + basic diagnostics
- v0.2 — Hybrid retrieval engine (BM25 + vector + reranker)
- v0.3 — Embedding versioning with zero-downtime migrations
- v1.0 — Advanced optimization tools, full observability integration
Next Steps
- Try the Quick Start — Get PyHound working with Qdrant in 5 minutes
- Read Use Cases — See which scenario matches your problem
- Check Benchmarks — Understand PyHound's performance vs competitors
- Explore Roadmap — See what's planned (v0.2-v1.0)
Support
- GitHub Discussions: https://github.com/Mullassery/pyhound/discussions
- Issues: https://github.com/Mullassery/pyhound/issues
- Email: mullassery@gmail.com
Authors
- Georgi Mammen Mullassery — Original creator
Acknowledgments
Built with:
- Rust ecosystem (fast, safe, embeddable)
- PyO3 (Python bindings)
- Open source community
Hunt down retrieval problems. Fix them fast.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyhound_core-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: pyhound_core-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 236.6 kB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24f3a120f264b60fdcbc5a055d3f9c4583ff644745b7988edc2c0108a4039625
|
|
| MD5 |
94a0d0b542d2331c67cb621046f1481b
|
|
| BLAKE2b-256 |
37271ce2dc6442fef047974d2a734441500b9c473788f9c8665ae011eac27a85
|