Unified Hyperbolic Spectral Retrieval (UHSR) - a novel text retrieval algorithm combining lexical and semantic search.
Project description
Unified Hyperbolic Spectral Retrieval (UHSR)
Unified Hyperbolic Spectral Retrieval (UHSR) is an advanced hybrid text retrieval model that seamlessly integrates lexical search (BM25) with semantic search (FAISS/Pinecone) while employing spectral re-ranking for interpretable and normalized relevance scores in the [0,1] range.
🚀 Key Features
- 🔍 Hybrid Retrieval: Combines BM25 for lexical scoring and dense vector semantic similarity for contextual understanding.
- 🎯 Multi-Metric Similarity: Supports cosine, euclidean, mahalanobis, manhattan, chebyshev, jaccard, and hamming similarity.
- 🔬 Spectral Re-Ranking: Uses graph Laplacian & Fiedler vector to boost highly relevant candidates.
- ⚡ AI-powered Reranking: Supports Hugging Face Cross-Encoders & OpenAI API-based Reranking.
- 📈 Interpretable Scores: Final relevance scores are logistic-normalized in [0,1] for easy ranking.
- 🚀 Scalable & Efficient: Works with FAISS (local) for fast retrieval and Pinecone (cloud-based) for large-scale vector search.
🛠️ How It Works
UHSR enhances traditional retrieval by blending BM25-based keyword matching with semantic vector representations using the following pipeline:
| Step | Description |
|---|---|
| 1️⃣ Lexical Filtering | Uses BM25 to rank documents by keyword relevance |
| 2️⃣ Semantic Scoring | Computes similarity using FAISS or Pinecone |
| 3️⃣ Fusion Process | Blends scores via logistic normalization & harmonic fusion |
| 4️⃣ Spectral Re-Ranking | Uses graph Laplacian analysis to boost central candidates |
| 5️⃣ (Optional) AI Reranking | Uses OpenAI API or Hugging Face Cross-Encoders |
🌍 Supported Retrieval Methods
- ✅ BM25 (Lexical Matching)
- ✅ FAISS (Local Vector Search)
- ✅ Pinecone (Cloud Vector Search)
- ✅ Hugging Face Rerankers
- ✅ OpenAI API-based Reranking
📌 Why UHSR?
- Better Search Results: Combines exact keyword matching (BM25) with contextual embeddings (Semantic Search).
- Faster & Scalable: Uses FAISS for local retrieval or Pinecone for cloud-based vector search.
- Interpretable Ranking: Outputs normalized scores in [0,1], making it easy to interpret.
- Multi-Metric Similarity: Supports cosine, euclidean, mahalanobis, manhattan, chebyshev, jaccard, and hamming.
🎯 Intended Use
UHSR is designed for:
- Information Retrieval Research
- Search Engines & Recommendation Systems
- NLP Applications in AI & Machine Learning
- Academic & Industry-scale Document Ranking
📂 Code & Documentation
For complete documentation, usage examples, and implementation details, visit the GitHub repository.
🔥 Try UHSR today and revolutionize your search engine! 🚀
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uhsr-0.2.6.tar.gz.
File metadata
- Download URL: uhsr-0.2.6.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b036d842ed37edb9cd8f4ba1aa0b588108f69c3091614931ddd981ae89e4cf9d
|
|
| MD5 |
40081be4b51b3756d587acda204e5a5c
|
|
| BLAKE2b-256 |
e25ce13c4cc2ce6367c9d253f38b75c60f7968a74d6e16ae9111b2a86aceecee
|
File details
Details for the file uhsr-0.2.6-py3-none-any.whl.
File metadata
- Download URL: uhsr-0.2.6-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed6a979ee08129cc7c72fa850512f3e8c115ade1e39b7eee6cfc050bf4861f34
|
|
| MD5 |
8c2bf4d57d89f339a62a9914f2deb6da
|
|
| BLAKE2b-256 |
c86a48b6f4406af0626d353d43b5896dddaa99ce4b5f30ff7b267ab6609215e4
|