Skip to main content

Unified Hyperbolic Spectral Retrieval (UHSR) - a novel text retrieval algorithm combining lexical and semantic search.

Project description

Unified Hyperbolic Spectral Retrieval (UHSR)

Unified Hyperbolic Spectral Retrieval (UHSR) is a novel text retrieval algorithm that fuses lexical search (using BM25) with semantic search (using dense embeddings) into a unified, robust, and scalable system. It leverages advanced techniques such as logistic normalization, harmonic fusion, and spectral re-ranking based on graph Laplacian analysis to produce interpretable relevance scores within the [0,1] range.

Key Features

  • Hybrid Retrieval: Combines BM25 for lexical scoring and dense vector semantic similarity for contextual understanding.
  • Advanced Fusion: Uses logistic normalization and harmonic fusion to integrate multiple scoring signals.
  • Spectral Re-Ranking: Employs spectral analysis (using the graph Laplacian and Fiedler vector) to boost central, highly relevant candidates.
  • Metric Flexibility: Supports multiple semantic similarity metrics (cosine, euclidean, Mahalanobis) to suit various datasets.
  • Interpretable Scores: Final relevance scores are normalized to the [0,1] range.
  • Scalable: Designed to work with both small and large datasets using FAISS for fast approximate nearest neighbor search.

Overview

UHSR provides an end-to-end text retrieval pipeline that starts with raw documents and ends with a ranked list of documents. It first applies BM25 to perform fast lexical filtering, then computes semantic similarity using dense embeddings. The two scoring components are fused via a harmonic mean after logistic normalization, ensuring that both lexical and semantic aspects contribute effectively. Finally, a spectral re-ranking step based on graph Laplacian analysis refines the ranking by boosting documents that are centrally located among the top candidates.

Intended Use

UHSR is intended for research and educational purposes and can serve as a strong foundation for further development in text retrieval and natural language processing applications.

For more details, visit the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uhsr-0.1.4.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uhsr-0.1.4-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file uhsr-0.1.4.tar.gz.

File metadata

  • Download URL: uhsr-0.1.4.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for uhsr-0.1.4.tar.gz
Algorithm Hash digest
SHA256 edd0e9aa81df12ec59187d235c4f2ab4ecb2d5f650929308f2bd477b3bfbd7d3
MD5 312dc0107e0a3a65aa07d1cca36a6cbf
BLAKE2b-256 960eb6d85621d82f1754b7790a4d29fe684c39fc5623e7e6db8e1076b29fabd0

See more details on using hashes here.

File details

Details for the file uhsr-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: uhsr-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for uhsr-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a3de64a6f4468f4008948d14397e40c4cd87336aa7ca4bb68ab93794fe5b527d
MD5 e1eeeed6d254713495b5d838dfd0bedc
BLAKE2b-256 b4d5dbd3743d8bb78a39a557bb1e94753219976a874a1e84f12d0a026246d89b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page