A unified API for various document re-ranking models.
Project description
TL;DR
UNDER CONSTRUCTION
Load any reranker, no matter the architecture:
from rerankers import Reranker
# Cross-encoder default
ranker = Reranker('cross-encoder')
# Specific cross-encoder
ranker = Reranker('mixedbread-ai/mxbai-rerank-xlarge-v1')
# T5 Seq2Seq reranker
ranker = Reranker("t5")
# Specific T5 Seq2Seq reranker
ranker = Reranker("unicamp-dl/InRanker-base")
# API (Cohere)
ranker = Reranker("cohere", lang='en' (or 'other'), api_key = API_KEY)
# API (Jina)
ranker = Reranker("jina", api_key = API_KEY)
# RankGPT4-turbo
ranker = Reranker("rankgpt", api_key = API_KEY)
# RankGPT3-turbo
ranker = Reranker("rankgpt3", api_key = API_KEY)
Then:
results = ranker.rank(query="I love you", docs=["I hate you", "I really like you", "I like you", "I despise you"])
You can also pass a list of doc_ids to rank(). If you don't, it'll be auto-generated and each doc_id will correspond to the index of any given document in docs.
Which will always return a RankedResults pydantic object, containing a list of Results:
RankedResults(results=[Result(doc_id=2, text='I like you', score=0.13376183807849884, rank=1), Result(doc_id=1, text='I really like you', score=0.002901385771110654, rank=2), Result(doc_id=0, text='I hate you', score=-2.278848886489868, rank=3), Result(doc_id=3, text='I despise you', score=-3.1964476108551025, rank=4)], query='I love you', has_scores=True)
You can retrieve however many top results by running .top_k() on a RankedResults object:
> results.top_k(1)
[Result(doc_id=2, text='I like you', score=0.13376183807849884, rank=1)]
> results.top_k(1)[0]['text']
'I like you'
You can also retrieve the score for a given doc_id. This is useful if you're scoring documents to use for knowledge distillation:
> results.get_score_by_docid(3)
-2.278848886489868
For the same purpose, you can also use ranker.score() to score a single Query-Document pair:
> ranker.score(query="I love you", doc="I hate you")
-2.278848886489868
Please note, score is not available for RankGPT rerankers, as they don't issue individual relevance scores but a list of ranked results!
Features
Legend:
- ✅ Supported
- 🟠 Implemented, but not fully fledged
- 📍Not supported but intended to be in the future
- ❌ Not supported & not currently planned
Supported models:
- ✅ Any standard SentenceTransformer or Transformers cross-encoder
- 🟠 RankGPT (Implemented using original repo, but missing the rankllm's repo improvements)
- ✅ T5-based pointwise rankers (InRanker, MonoT5...)
- ✅ Cohere API rerankers
- ✅ Jina API rerankers
- 📍 MixedBread API (Reranking API not yet released)
- 📍 RankLLM/RankZephyr (Proper RankLLM implementation should replace the unsafe RankGPT one)
- 📍 LiT5
Supported features:
- ✅ Reranking
- 📍 Training on Python >=3.10 (via interfacing with other libraries)
- 📍 ONNX runtime support (quantised rankers, more efficient inference, etc...)
- ❌(📍Maybe?) Training via rerankers directly
Usage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rerankers-0.0.1.post2.tar.gz.
File metadata
- Download URL: rerankers-0.0.1.post2.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32a86001ebf47665f982d4ae4817fcbd44a260ae52866e3f33e66a0a6ee9eb06
|
|
| MD5 |
56b3962a4b8fe4c6e041c81deeff6fdd
|
|
| BLAKE2b-256 |
ca110dff90dc771e4736465ae25f8fac2fa5fe21a82e1265abdf4912d51acb1a
|
File details
Details for the file rerankers-0.0.1.post2-py3-none-any.whl.
File metadata
- Download URL: rerankers-0.0.1.post2-py3-none-any.whl
- Upload date:
- Size: 23.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5326385abb72337aa143ae4d8bbe5412a059d40ed12ecad7abc2f38bfb58cd38
|
|
| MD5 |
60465b24994e7b4f6f4fe2a47e8b4001
|
|
| BLAKE2b-256 |
dcd466a2f4b5df853256ca2be5c86d3e75d229cf397699b11d9867b9ab39824f
|