Skip to main content

A unified API for various document re-ranking models.

Project description

TL;DR

UNDER CONSTRUCTION

Load any reranker, no matter the architecture:

from rerankers import Reranker

# Cross-encoder default
ranker = Reranker('cross-encoder')

# Specific cross-encoder
ranker = Reranker('mixedbread-ai/mxbai-rerank-xlarge-v1')

# T5 Seq2Seq reranker
ranker = Reranker("t5")

# Specific T5 Seq2Seq reranker
ranker = Reranker("unicamp-dl/InRanker-base")

# API (Cohere)
ranker = Reranker("cohere", lang='en' (or 'other'), api_key = API_KEY)

# API (Jina)
ranker = Reranker("jina", api_key = API_KEY)

# RankGPT4-turbo
ranker = Reranker("rankgpt", api_key = API_KEY)

# RankGPT3-turbo
ranker = Reranker("rankgpt3", api_key = API_KEY)

Then:

results = ranker.rank(query="I love you", docs=["I hate you", "I really like you", "I like you", "I despise you"])

You can also pass a list of doc_ids to rank(). If you don't, it'll be auto-generated and each doc_id will correspond to the index of any given document in docs.

Which will always return a RankedResults pydantic object, containing a list of Results:

RankedResults(results=[Result(doc_id=2, text='I like you', score=0.13376183807849884, rank=1), Result(doc_id=1, text='I really like you', score=0.002901385771110654, rank=2), Result(doc_id=0, text='I hate you', score=-2.278848886489868, rank=3), Result(doc_id=3, text='I despise you', score=-3.1964476108551025, rank=4)], query='I love you', has_scores=True)

You can retrieve however many top results by running .top_k() on a RankedResults object:

> results.top_k(1)
[Result(doc_id=2, text='I like you', score=0.13376183807849884, rank=1)]
> results.top_k(1)[0]['text']
'I like you'

You can also retrieve the score for a given doc_id. This is useful if you're scoring documents to use for knowledge distillation:

> results.get_score_by_docid(3)
-2.278848886489868

For the same purpose, you can also use ranker.score() to score a single Query-Document pair:

> ranker.score(query="I love you", doc="I hate you")
-2.278848886489868

Please note, score is not available for RankGPT rerankers, as they don't issue individual relevance scores but a list of ranked results!

Features

Legend:

  • ✅ Supported
  • 🟠 Implemented, but not fully fledged
  • 📍Not supported but intended to be in the future
  • ❌ Not supported & not currently planned

Supported models:

  • ✅ Any standard SentenceTransformer or Transformers cross-encoder
  • 🟠 RankGPT (Implemented using original repo, but missing the rankllm's repo improvements)
  • ✅ T5-based pointwise rankers (InRanker, MonoT5...)
  • ✅ Cohere API rerankers
  • ✅ Jina API rerankers
  • 📍 MixedBread API (Reranking API not yet released)
  • 📍 RankLLM/RankZephyr (Proper RankLLM implementation should replace the unsafe RankGPT one)
  • 📍 LiT5

Supported features:

  • ✅ Reranking
  • 📍 Training on Python >=3.10 (via interfacing with other libraries)
  • 📍 ONNX runtime support (quantised rankers, more efficient inference, etc...)
  • ❌(📍Maybe?) Training via rerankers directly

Usage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rerankers-0.0.1.post2.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rerankers-0.0.1.post2-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file rerankers-0.0.1.post2.tar.gz.

File metadata

  • Download URL: rerankers-0.0.1.post2.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for rerankers-0.0.1.post2.tar.gz
Algorithm Hash digest
SHA256 32a86001ebf47665f982d4ae4817fcbd44a260ae52866e3f33e66a0a6ee9eb06
MD5 56b3962a4b8fe4c6e041c81deeff6fdd
BLAKE2b-256 ca110dff90dc771e4736465ae25f8fac2fa5fe21a82e1265abdf4912d51acb1a

See more details on using hashes here.

File details

Details for the file rerankers-0.0.1.post2-py3-none-any.whl.

File metadata

File hashes

Hashes for rerankers-0.0.1.post2-py3-none-any.whl
Algorithm Hash digest
SHA256 5326385abb72337aa143ae4d8bbe5412a059d40ed12ecad7abc2f38bfb58cd38
MD5 60465b24994e7b4f6f4fe2a47e8b4001
BLAKE2b-256 dcd466a2f4b5df853256ca2be5c86d3e75d229cf397699b11d9867b9ab39824f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page