A unified API for various document re-ranking models.
Project description
TL;DR
UNDER CONSTRUCTION
Load any reranker, no matter the architecture:
from rerankers import Reranker
# Cross-encoder default
ranker = Reranker('cross-encoder')
# Specific cross-encoder
ranker = Reranker('mixedbread-ai/mxbai-rerank-xlarge-v1')
# T5 Seq2Seq reranker
ranker = Reranker("t5")
# Specific T5 Seq2Seq reranker
ranker = Reranker("unicamp-dl/InRanker-base")
# API (Cohere)
ranker = Reranker("cohere", lang='en' (or 'other'), api_key = API_KEY)
# API (Jina)
ranker = Reranker("jina", api_key = API_KEY)
# RankGPT4-turbo
ranker = Reranker("rankgpt", api_key = API_KEY)
# RankGPT3-turbo
ranker = Reranker("rankgpt3", api_key = API_KEY)
Then:
results = ranker.rank(query="I love you", docs=["I hate you", "I really like you", "I like you", "I despise you"])
You can also pass a list of doc_ids
to rank()
. If you don't, it'll be auto-generated and each doc_id will correspond to the index of any given document in docs
.
Which will always return a RankedResults
pydantic object, containing a list of Result
s:
RankedResults(results=[Result(doc_id=2, text='I like you', score=0.13376183807849884, rank=1), Result(doc_id=1, text='I really like you', score=0.002901385771110654, rank=2), Result(doc_id=0, text='I hate you', score=-2.278848886489868, rank=3), Result(doc_id=3, text='I despise you', score=-3.1964476108551025, rank=4)], query='I love you', has_scores=True)
You can retrieve however many top results by running .top_k() on a RankedResults
object:
> results.top_k(1)
[Result(doc_id=2, text='I like you', score=0.13376183807849884, rank=1)]
> results.top_k(1)[0]['text']
'I like you'
You can also retrieve the score for a given doc_id. This is useful if you're scoring documents to use for knowledge distillation:
> results.get_score_by_docid(3)
-2.278848886489868
For the same purpose, you can also use ranker.score()
to score a single Query-Document pair:
> ranker.score(query="I love you", doc="I hate you")
-2.278848886489868
Please note, score
is not available for RankGPT rerankers, as they don't issue individual relevance scores but a list of ranked results!
Features
Legend:
- ✅ Supported
- 🟠 Implemented, but not fully fledged
- 📍Not supported but intended to be in the future
- ❌ Not supported & not currently planned
Supported models:
- ✅ Any standard SentenceTransformer or Transformers cross-encoder
- 🟠 RankGPT (Implemented using original repo, but missing the rankllm's repo improvements)
- ✅ T5-based pointwise rankers (InRanker, MonoT5...)
- ✅ Cohere API rerankers
- ✅ Jina API rerankers
- 📍 MixedBread API (Reranking API not yet released)
- 📍 RankLLM/RankZephyr (Proper RankLLM implementation should replace the unsafe RankGPT one)
- 📍 LiT5
Supported features:
- ✅ Reranking
- 📍 Training on Python >=3.10 (via interfacing with other libraries)
- 📍 ONNX runtime support (quantised rankers, more efficient inference, etc...)
- ❌(📍Maybe?) Training via rerankers directly
Usage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for rerankers-0.0.1.post2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5326385abb72337aa143ae4d8bbe5412a059d40ed12ecad7abc2f38bfb58cd38 |
|
MD5 | 60465b24994e7b4f6f4fe2a47e8b4001 |
|
BLAKE2b-256 | dcd466a2f4b5df853256ca2be5c86d3e75d229cf397699b11d9867b9ab39824f |