Skip to main content

Lightweight hybrid reranker with baked-in model artifact.

Project description

small-hybrid-reranker

small-hybrid-reranker is a lightweight reranker package with a baked-in trained model.

It reranks a list of passages for a query using a hybrid feature stack:

  • static embeddings (cnmoro/static-nomic-384-pten)
  • lexical overlap and token interaction sketches
  • BM25 and dense retrieval priors
  • listwise LightGBM ranker

The model artifact is included in the package, so there is no separate checkpoint download.

Model In This Release

  • Version 0.2.0 packages an updated model trained on all available SciFact splits in this repository (train + test) for maximum fit.
  • Training setup used strict BM25 top-100 candidates with LightGBM LambdaRank over hybrid features.
  • In-sample all-sets metric from training run:
    • ndcg@10: 0.89999
    • recall@10: 0.89830

Inference remains lightweight and CPU-friendly: the API is still a single HybridReranker().rerank(query, passages) call.

Install

pip install small-hybrid-reranker

Quickstart

from small_hybrid_reranker import HybridReranker

reranker = HybridReranker()

query = "What is the speed of light?"
passages = [
    "The speed of light in a vacuum is about 299,792 km/s.",
    "Earth orbits the Sun in about 365 days.",
    "Newton described laws of motion.",
]

ranked = reranker.rerank(query, passages)
print(ranked[0])
# {'passage': 'The speed of light in a vacuum is about 299,792 km/s.', 'score': 100.0}

API

HybridReranker(model_path: str | None = None)

  • model_path=None: uses baked-in model inside package.
  • model_path="...joblib": load your own compatible artifact.

rerank(query: str, passages: list[str], top_k: int | None = None) -> list[dict]

Returns:

[
  {"passage": "...", "score": 82.31},
  {"passage": "...", "score": 40.87},
]

Scores are floats in [0, 100] and sorted descending.

Notes

  • This package is optimized for reranking a provided candidate list.
  • It is not a full retrieval system by itself.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

small_hybrid_reranker-0.2.0.tar.gz (13.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

small_hybrid_reranker-0.2.0-py3-none-any.whl (13.8 MB view details)

Uploaded Python 3

File details

Details for the file small_hybrid_reranker-0.2.0.tar.gz.

File metadata

  • Download URL: small_hybrid_reranker-0.2.0.tar.gz
  • Upload date:
  • Size: 13.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for small_hybrid_reranker-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d0f289a351c8ce4cd0b37733c0a52ccc52666c75e28d1e1ece61550f626579f2
MD5 ac9e1fe2370039ad15242010e5b44299
BLAKE2b-256 14f3e1129653fa583b761dfdb7e319916af809af6f422ff89660c7ee25c2b4a6

See more details on using hashes here.

File details

Details for the file small_hybrid_reranker-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for small_hybrid_reranker-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 431d8c9b5ef33f4884fc1684b5bec9b8bff3d8c28c0025ef7d16d1404ab28d85
MD5 9aa132af43c709ddb83c1a032dbaee29
BLAKE2b-256 1fba6b5d6fd832f9555cd09b9cc018b7f718f322a4fd9fc07ecaac10c3ca27e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page