rerankers

A unified API for various document re-ranking models.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

TL;DR

UNDER CONSTRUCTION

Load any reranker, no matter the architecture:

from rerankers import Reranker

# Cross-encoder default
ranker = Reranker('cross-encoder')

# Specific cross-encoder
ranker = Reranker('mixedbread-ai/mxbai-rerank-xlarge-v1')

# T5 Seq2Seq reranker
ranker = Reranker("t5")

# Specific T5 Seq2Seq reranker
ranker = Reranker("unicamp-dl/InRanker-base")

# API (Cohere)
ranker = Reranker("cohere", lang='en' (or 'other'), api_key = API_KEY)

# API (Jina)
ranker = Reranker("jina", api_key = API_KEY)

# RankGPT4-turbo
ranker = Reranker("rankgpt", api_key = API_KEY)

# RankGPT3-turbo
ranker = Reranker("rankgpt3", api_key = API_KEY)

Then:

results = ranker.rank(query="I love you", docs=["I hate you", "I really like you", "I like you", "I despise you"])

You can also pass a list of doc_ids to rank(). If you don't, it'll be auto-generated and each doc_id will correspond to the index of any given document in docs.

Which will always return a RankedResults pydantic object, containing a list of Results:

RankedResults(results=[Result(doc_id=2, text='I like you', score=0.13376183807849884, rank=1), Result(doc_id=1, text='I really like you', score=0.002901385771110654, rank=2), Result(doc_id=0, text='I hate you', score=-2.278848886489868, rank=3), Result(doc_id=3, text='I despise you', score=-3.1964476108551025, rank=4)], query='I love you', has_scores=True)

You can retrieve however many top results by running .top_k() on a RankedResults object:

> results.top_k(1)
[Result(doc_id=2, text='I like you', score=0.13376183807849884, rank=1)]
> results.top_k(1)[0]['text']
'I like you'

You can also retrieve the score for a given doc_id. This is useful if you're scoring documents to use for knowledge distillation:

> results.get_score_by_docid(3)
-2.278848886489868

For the same purpose, you can also use ranker.score() to score a single Query-Document pair:

> ranker.score(query="I love you", doc="I hate you")
-2.278848886489868

Please note, score is not available for RankGPT rerankers, as they don't issue individual relevance scores but a list of ranked results!

Features

TODO PRE-RELEASE:

Allow the use of RankGPT with other LLMs (but no RankZephyr codebase yet)
Allow easier model_type specification via inference (will also fix the above)
LangChain export as Compressor
Llama-index integration (maybe?)

Legend:

✅ Supported
🟠 Implemented, but not fully fledged
📍Not supported but intended to be in the future
❌ Not supported & not currently planned

Supported models: ✅ Any standard SentenceTransformer or Transformers cross-encoder 🟠 RankGPT (Implemented using original repo, but missing the rankllm's repo improvements) ✅ T5-based pointwise rankers (InRanker, MonoT5...) ✅ Cohere API rerankers ✅ Jina API rerankers 📍 MixedBread API (Reranking API not yet released) 📍 RankLLM/RankZephyr (Proper RankLLM implementation should replace the unsafe RankGPT one) 📍 LiT5

Supported features: ✅ Reranking 📍 Training on Python >=3.10 (via interfacing with other libraries) 📍 ONNX runtime support (quantised rankers, more efficient inference, etc...) ❌(📍Maybe?) Training via rerankers directly

Usage

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.0

May 17, 2024

0.2.0

Apr 12, 2024

0.1.2.post1

Apr 8, 2024

0.1.2

Mar 20, 2024

0.1.1

Mar 19, 2024

0.1.0.post1

Mar 15, 2024

0.1.0

Mar 14, 2024

0.1.0rc0.post1 pre-release

Mar 13, 2024

0.0.2.post1

Mar 13, 2024

0.0.2

Mar 12, 2024

0.0.1.post2

Mar 12, 2024

This version

0.0.1.post1

Mar 8, 2024

0.0.1

Mar 8, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rerankers-0.0.1.post1.tar.gz (17.1 kB view hashes)

Uploaded Mar 8, 2024 Source

Built Distribution

rerankers-0.0.1.post1-py3-none-any.whl (18.7 kB view hashes)

Uploaded Mar 8, 2024 Python 3

Hashes for rerankers-0.0.1.post1.tar.gz

Hashes for rerankers-0.0.1.post1.tar.gz
Algorithm	Hash digest
SHA256	`b20d656bdaa08a49b8ad0f51e8a2fa54f722dd5be90dca3ff0ed8ebb0d94def0`
MD5	`9fdcd63d563701dac5ef940aa9c52e26`
BLAKE2b-256	`d20a8f1326171956c25b57b4620d1408baeced07054c73449196673e74173552`

Hashes for rerankers-0.0.1.post1-py3-none-any.whl

Hashes for rerankers-0.0.1.post1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f8da7ea440ba4d3a357e088eca97ee99401f1c59e126d6371bbad5f3d0ec51e`
MD5	`783d1058e46a2aa0e6a4c70eaacf64de`
BLAKE2b-256	`1b22f28820fc707f14c9caf4b756c16ea16be04b40a35eb769e1fdc3a1ba55ec`