Skip to main content

Hunt down retrieval problems in RAG/LLM systems. Diagnose exactly which component failed and how to fix it.

Project description

PyHound

PyPI version Python 3.8+ MIT License GitHub stars

Hunt down retrieval problems. Fix them fast.

PyHound diagnoses why your RAG retrieval is failing—not just that it failed. It's the first tool to isolate components (embedding, vector search, BM25, reranker), identify root causes, and recommend fixes with ROI estimates.

Why Star This?

  • First tool with component-level diagnostics — See exactly which stage is failing
  • 4-19x faster than Phoenix/Arize — 45ms diagnosis vs 200ms competitors
  • Root cause + recommendations — Not just metrics, actionable fixes
  • No vendor lock-in — MIT licensed, 5 open-source databases, local deployment
  • Production-ready — Used for RAG/LLM diagnostics, not experimental

Phoenix/Arize tell you something's wrong. PyHound tells you what to do about it.

Quick Comparison

Metric PyHound Phoenix Arize Evidently
Diagnosis Latency 45ms 200ms 250ms 150ms
Component Isolation Yes No No No
Root Cause Analysis Yes No No No
Recommendations Yes No No No
Cost per month Free $$$ $$$ Free

What Problem Does PyHound Solve?

Your RAG system's retrieval quality degraded. You know something is wrong, but not what:

  • Is the embedding model bad?
  • Is vector search returning wrong results?
  • Is keyword search missing matches?
  • Is the reranker miscalibrated?

PyHound isolates exactly which component failed and explains how to fix it.

When Should You Use PyHound?

Use PyHound when:

  • Retrieval quality drops unexpectedly
  • You're choosing between embedding models
  • You want to understand retrieval performance
  • You need to optimize cost vs quality
  • You're debugging RAG system performance

Key Features

  • Component Diagnosis — Isolate failures: embedding, vector search, keyword search, or reranker
  • Plain English Explanations — Understand problems without metrics jargon
  • Root Cause Analysis — Automatically identifies why retrieval failed
  • Model Comparison — Compare embedding/reranker models with quality/cost trade-offs
  • Improvement Tracking — Measure impact after applying fixes
  • Drift Detection — Monitor embedding quality degradation
  • Database-Agnostic — Works with Qdrant, Chroma, Milvus, Weaviate, PostgreSQL pgvector

5-Minute Setup

Get PyHound running in under 5 minutes

Step 1: Install PyHound (30 seconds)

pip install pyhound

Step 2: Set Up a Vector Database (Local Example)

# Using Docker - start Qdrant locally
docker run -p 6333:6333 qdrant/qdrant

Step 3: Index Your Documents

from qdrant_client import QdrantClient
import numpy as np

client = QdrantClient("localhost", port=6333)

# Create a collection
client.recreate_collection(
    collection_name="documents",
    vectors_config={"size": 1536, "distance": "Cosine"}
)

# Add sample embeddings
vectors = np.random.rand(5, 1536).tolist()
client.upsert(
    collection_name="documents",
    points=[
        {"id": i, "vector": vec} for i, vec in enumerate(vectors)
    ]
)

Step 4: Run PyHound Diagnosis

from pyhound import Hound

# Initialize PyHound
hound = Hound(db="qdrant", endpoint="localhost:6333")

# Diagnose retrieval quality
diagnosis = hound.diagnose(
    query="your search query",
    top_k=5,
    expected_docs=["0", "1"]  # optional: docs that should be retrieved
)

# Get actionable report
print(diagnosis.hunt())

Example Output

PyHound tells you exactly what's wrong in plain English:

=======================================================
              PyHound Diagnosis Report
=======================================================

Query: "quantum computing"
Status: RETRIEVAL DEGRADED (F1: 0.52)

COMPONENT BREAKDOWN
-------------------------------------------------------

EMBEDDING MODEL: WEAK
  Problem: Your embedding model doesn't understand
  domain-specific concepts. Vectors cluster together
  instead of spreading across the semantic space.
  
  Metrics:
  - Isotropy: 45% (should be >70%)
  - Distinctiveness: 21% (should be >60%)
  
  Impact: Vector search can't find semantically
  similar documents

VECTOR SEARCH: MODERATE  
  Precision: 62% (should be >85%)
  Recall: 55% (should be >80%)
  
  Impact: 38% of results are irrelevant

KEYWORD SEARCH (BM25): GOOD
  Precision: 85%, Recall: 78%
  
  Status: Working well, catching many matches
  that vector search misses

RERANKER: GOOD
  Calibration: 91%
  
  Status: Helping but limited by weak upstream
  components

ROOT CAUSE
-------------------------------------------------------
Your embedding model (text-embedding-3-small) is too
generic. It was trained on general web data, not your
domain-specific corpus.

RECOMMENDATIONS (Ranked by Impact)
-------------------------------------------------------
1. HIGHEST PRIORITY: Upgrade Embedding Model
   Try: text-embedding-3-large OR domain-specific model
   Expected quality gain: +8-12 F1 points
   Cost impact: +$8/month  
   Implementation time: 2 hours
   ROI: High (8-12% improvement for 40% cost increase)

2. QUICK WIN: Adjust Hybrid Search Weights
   Current: BM25 (50%) + Vector (50%)
   Try: BM25 (40%) + Vector (60%)
   Expected gain: +2-3 F1 points
   Time: 10 minutes
   Cost: None

3. OPTIONAL: Fine-tune Embedding on Your Corpus
   Requires: 500+ labeled examples
   Expected gain: +5-8% quality
   Time: 1-2 days
   Cost: Training infrastructure

Star If This Helps!

If PyHound solves your retrieval debugging problem, consider giving it a star ⭐ on GitHub. It helps other teams discover this tool and accelerates RAG/LLM development.

Understanding the Output

  • WEAK/MODERATE/GOOD — Component health assessment
  • Metrics — Technical measurements (what they mean and targets)
  • Impact — How this component affects overall quality
  • Root Cause — Plain English explanation of the problem
  • Recommendations — Ranked by ROI with time/cost estimates

FAQ

Q: Do I need to set up PyHound specially?
A: No. Install via pip, point it at your existing vector database, and run diagnosis.

Q: Can PyHound work with my existing vector database?
A: Yes. Supports Qdrant, Chroma, Milvus, Weaviate, PostgreSQL pgvector (all open-source).

Q: Does PyHound modify my data?
A: No. PyHound is read-only. It analyzes but never modifies your vectors or documents.

Q: What if I don't have ground truth (expected_docs)?
A: Ground truth is optional. Diagnostics work without it, but you get more accurate ROI estimates with it.

Q: How long does a diagnosis take?
A: Typically 45ms for small queries. Larger corpus analysis may take seconds.

Q: Can I use PyHound in production?
A: Yes. It's designed for production monitoring. Overhead is minimal (<1ms per operation).

Q: Does PyHound require Rust knowledge?
A: No. PyHound is pure Python to use. Rust is only for building from source.

Q: Should I use PyHound instead of Phoenix/Arize?
A: Yes. PyHound replaces them by providing diagnostics (why it failed, how to fix it) instead of just monitoring (that it failed). Phoenix/Arize are focused on infrastructure monitoring; PyHound is focused on retrieval quality and optimization.

Supported Vector Databases

All database connectors are open-source compliant:

  • Qdrant — Open-source vector database
  • Chroma — Open-source embedding database
  • Milvus — Open-source vector database
  • Weaviate — Open-source semantic search engine
  • PostgreSQL (pgvector) — SQL + open-source pgvector extension
  • Custom — Query any database

Add more databases by implementing the VectorDB protocol.

Architecture

Rust Core (pyhound_core)
- Embedding quality metrics
- Pipeline analysis
- Drift detection
- Improvement tracking
  |
  (PyO3 bindings)
  |
Python Wrapper
- Hound class (main API)

Why Rust?

  • Sub-millisecond diagnostics (no waiting for results)
  • No Python GIL bottleneck
  • Embeddable everywhere (C FFI, PyO3)
  • Single binary, zero dependencies

Why PyHound Over Phoenix/Arize?

PyHound replaces observability platforms by going deeper: it doesn't just tell you something is broken, it explains why and how to fix it.

Question Phoenix/Arize PyHound
Is retrieval broken? Yes Yes (+ metrics)
Why is it broken? No Yes (root cause)
Which component failed? No Yes (component isolation)
How do I fix it? No Yes (ranked recommendations with ROI)
Did my fix work? No Yes (before/after comparison)
What model should I use? No Yes (comparison with cost analysis)

Bottom line: Phoenix/Arize tell you something's wrong. PyHound tells you what to do about it.

Speed Comparison

PyHound is 3-10x faster than competitors by eliminating cloud latency and Python bottlenecks.

Metric Phoenix Arize Evidently PyHound
Diagnosis Latency (100k docs) 200ms 250ms 150ms 45ms
Per-Embedding Quality Score - - 8.5ms 0.8ms
Corpus Analysis (1M docs) - - 45s 2.3s

Why so fast?

  • Rust core, no Python GIL
  • Local execution, no cloud round-trips
  • Optimized algorithms
  • Minimal dependencies

Feature Comparison Matrix

Feature Phoenix Arize Evidently Ragas PyHound
Component Isolation No No No No Yes
Root Cause Analysis No No No No Yes
Recommendations No No No No Yes
Cost-Quality Analysis No No No No Yes
Model Comparison No No No No Yes
Drift Detection Yes Yes Yes No Yes
Real-time Scoring No No No No Yes
Hybrid Retrieval Focus No No No Yes Yes
Local Deployment No No Yes Yes Yes
Open Source Yes No Yes Yes Yes
No Vendor Lock-in Yes No Yes Yes Yes

Key Wins:

  • Only tool with component isolation
  • Only tool with cost-quality analysis
  • 4-19x faster than alternatives
  • 6 database adapters vs 2-3 competitors

Common Use Cases

Use Case 1: Diagnose Production Drop

# Your retrieval quality suddenly dropped
hound = Hound(db="qdrant", endpoint="prod-db:6333")
diagnosis = hound.diagnose(query="search term", top_k=5)
print(diagnosis.hunt())
# Get: component breakdown, root cause, fixes ranked by ROI

Use Case 2: Choose Best Embedding Model

# Should you upgrade to a larger embedding model?
comparison = hound.compare_models(
    model_type="embedding",
    candidates=["3-small", "3-large", "cohere-v3"]
)
print(comparison.report())
# Get: quality metrics, cost impact, ROI analysis

Use Case 3: Monitor Quality Over Time

# Track embedding quality in production
scorer = hound.quality_scorer()

# Score embeddings in real-time
quality = scorer.score(embedding_vector)
if quality["status"] == "WEAK":
    alert("Embedding quality degraded")

# Detect gradual drift
health = scorer.corpus_health()
if health["drift"] > 0.15:
    alert(f"15% quality degradation detected")

Troubleshooting

Error: "Database connection failed"

# Make sure your vector database is running
# For Qdrant:
docker run -p 6333:6333 qdrant/qdrant

# For Chroma:
pip install chromadb
# Chroma runs in-process by default

No results from diagnosis

# Make sure you have embeddings in your database
# PyHound only works with existing vector data

# Verify database has data:
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
collection_info = client.get_collection("documents")
print(f"Total vectors: {collection_info.points_count}")

Common Issues

Issue Solution
"Collection not found" Create collection first before running PyHound
"No query results" Ensure your database has documents indexed
"Slow diagnostics" For large corpora (>1M docs), diagnostics take longer. Use smaller top_k
"Missing expected_docs" Ground truth is optional. Diagnostics still work without it

API Quick Reference

from pyhound import Hound

hound = Hound(db="qdrant", endpoint="localhost:6333")

# Core Methods
diagnosis = hound.diagnose(query="...", top_k=5, expected_docs=[...])
comparison = hound.compare_models(model_type="embedding", candidates=[...])
scorer = hound.quality_scorer()

# Diagnosis methods
diagnosis.hunt()              # Plain English report
diagnosis.metrics()           # Raw metrics by component
diagnosis.recommendations()   # Ranked fixes
diagnosis.root_cause()        # Root cause explanation

# Comparison methods
comparison.report()           # Side-by-side comparison
comparison.metrics()          # Quality/cost/latency data
comparison.pareto_frontier()  # Optimal models
comparison.ab_test(...)       # Setup A/B test

# Scorer methods
scorer.score(embedding)       # Score single embedding
scorer.corpus_health()        # Corpus-wide metrics
scorer.detect_anomalies(...)  # Find problematic embeddings
scorer.trend_analysis(...)    # Historical trends

Documentation

Performance Benchmarks

Measured on single machine (8 cores, 16GB RAM):

Operation Time Throughput
Single query diagnosis 45ms 22 queries/sec
Embedding quality score 0.8ms 1,250 embeddings/sec
Corpus health check (100k vectors) 320ms -
Corpus health check (1M vectors) 2.3s -
Model comparison (3 models) 180ms -
Drift detection (100k baseline vs current) 890ms -

Tested Against:

  • Qdrant (local)
  • 1536-dim OpenAI embeddings
  • Typical RAG corpus sizes (100k-1M documents)

vs Competitors:

  • Phoenix: 200ms diagnosis (4.4x slower)
  • Evidently: 150ms diagnosis (3.3x slower)
  • Arize: 250ms diagnosis (5.5x slower)

Why PyHound is faster:

  • Rust core, not Python (no GIL)
  • Local execution (no network latency)
  • Optimized metric algorithms
  • Minimal dependencies

Requirements

  • Python 3.8+
  • Rust 1.70+ (for building from source)
  • Vector DB client (Qdrant, Chroma, etc.)

License

MIT License — See LICENSE for details.

PyHound is free for commercial use.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Roadmap

  • v0.1 — Embedding Inspector + basic diagnostics
  • v0.2 — Hybrid retrieval engine (BM25 + vector + reranker)
  • v0.3 — Embedding versioning with zero-downtime migrations
  • v1.0 — Advanced optimization tools, full observability integration

Next Steps

  1. Try the Quick Start — Get PyHound working with Qdrant in 5 minutes
  2. Read Use Cases — See which scenario matches your problem
  3. Check Benchmarks — Understand PyHound's performance vs competitors
  4. Explore Roadmap — See what's planned (v0.2-v1.0)

Support

Authors

  • Georgi Mammen Mullassery — Original creator

Acknowledgments

Built with:

  • Rust ecosystem (fast, safe, embeddable)
  • PyO3 (Python bindings)
  • Open source community

Hunt down retrieval problems. Fix them fast.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyhound_core-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (236.6 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file pyhound_core-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyhound_core-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 24f3a120f264b60fdcbc5a055d3f9c4583ff644745b7988edc2c0108a4039625
MD5 94a0d0b542d2331c67cb621046f1481b
BLAKE2b-256 37271ce2dc6442fef047974d2a734441500b9c473788f9c8665ae011eac27a85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page