Forge better rankings from candidate documents with LLM reranking.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

skiiwoo

These details have not been verified by PyPI

Project description

ranksmith

ranksmith icon

Forge better rankings from candidate documents.

한국어 문서

ranksmith is a small Python package for LLM-based reranking. Version 1 focuses on Azure OpenAI powered zero-shot listwise reranking for candidate documents.

Install

pip install ranksmith

Quick Start

from ranksmith import AzureOpenAIReranker, Document

reranker = AzureOpenAIReranker(
    api_key="...",
    azure_endpoint="https://example.openai.azure.com",
    azure_deployment="gpt-4o-mini",
)

results = reranker.rerank(
    query="What is listwise reranking?",
    documents=[
        Document(id="a", text="Listwise reranking compares candidates together."),
        Document(id="b", text="Vector search retrieves candidate documents."),
    ],
    top_k=2,
)

for result in results:
    print(result.rank, result.original_index, result.document.id)

rank is 1-based for display. original_index is 0-based so it maps back to the input list.

Supported Strategies & Algorithms

ranksmith separates the evaluation methodology (Strategy) from its specific execution logic (Algorithm). Version 1 supports listwise reranking and pairwise PRP reranking.

1. ListwiseStrategy (RankGPT)

This strategy places multiple documents into a single prompt and asks the LLM to rank them all at once.

rankgpt_sliding_window Algorithm (Default)
- Implements the RankGPT-style back-to-first sliding window with bubble-up behavior.
- Useful when you want RankGPT's window traversal semantics while keeping ranksmith's strict JSON output validation.

2. PairwiseStrategy (PRP)

This strategy compares two documents at a time using Pairwise Ranking Prompting.

prp_sliding_k Algorithm
- Starts from the bottom of the current ranking and compares adjacent pairs.
- Calls the provider twice per pair, swapping A/B order to reduce position bias.
- Conflicting valid comparisons are treated as ties and keep the current order.
- Default passes=10, matching the PRP-Sliding-10 setting from the reference paper.
- Expected provider calls per query: 2 * passes * max(document_count - 1, 0).
- AsyncPairwiseStrategy can run each pair's A/B and B/A calls concurrently with pair_order_parallelism=2 without changing PRP traversal or call count.

How to Apply a Strategy

You can configure and inject a custom strategy into the AzureOpenAIReranker.

from ranksmith import AzureOpenAIReranker, ListwiseStrategy, PairwiseStrategy

# 1. Configure the strategy and algorithm
strategy = ListwiseStrategy(
    algorithm="rankgpt_sliding_window",
    window_size=20,             # Number of documents evaluated at once
    stride=10,                  # Number of overlapping documents between windows
    max_document_chars=4000,    # Max characters allowed per document
)

# 2. Inject into the Reranker
reranker = AzureOpenAIReranker(
    api_key="...",
    azure_endpoint="https://example.openai.azure.com",
    azure_deployment="gpt-4o-mini",
    strategy=strategy, # <-- Inject the strategy here
)

results = reranker.rerank("query", documents)

Pairwise PRP can be injected the same way:

strategy = PairwiseStrategy(
    algorithm="prp_sliding_k",
    passes=10,
    max_document_chars=4000,
)

reranker = AzureOpenAIReranker(
    api_key="...",
    azure_endpoint="https://example.openai.azure.com",
    azure_deployment="gpt-4o-mini",
    strategy=strategy,
)

Note: If strategy is not provided, it defaults to ListwiseStrategy(algorithm="rankgpt_sliding_window"). Pairwise PRP uses many more LLM calls than listwise reranking, so check call estimates before live benchmarks.

For lower PRP wall time, use the async strategy. This preserves the PRP-Sliding-K method: adjacent pairs are still processed bottom-to-top, while only the two order-swapped calls for the same pair are concurrent.

from ranksmith import AsyncAzureOpenAIReranker, AsyncPairwiseStrategy

reranker = AsyncAzureOpenAIReranker(
    api_key="...",
    azure_endpoint="https://example.openai.azure.com",
    azure_deployment="gpt-4o-mini",
    strategy=AsyncPairwiseStrategy(
        passes=10,
        pair_order_parallelism=2,
    ),
)

Async Support

ranksmith provides first-class asynchronous support for high-throughput environments like FastAPI.

from ranksmith import AsyncAzureOpenAIReranker

reranker = AsyncAzureOpenAIReranker(
    api_key="...",
    azure_endpoint="https://example.openai.azure.com",
    azure_deployment="gpt-4o-mini",
)

results = await reranker.rerank("query", documents)

Examples

Ready-to-use example code for integrating the RankGPT algorithm into your production environment can be found in the examples/ directory.

examples/rankgpt_sync.py: Synchronous RankGPT integration guide
examples/rankgpt_async.py: High-performance asynchronous RankGPT integration guide

Benchmarking

ranksmith includes a qrels-backed comparison runner for reranking algorithms. It can run against the committed smoke fixture or a local BEIR/SciFact cache. BEIR mode requires a first-stage candidate TSV, because qrels alone are not a valid reranking benchmark.

Expected BEIR/SciFact cache layout:

.benchmark-cache/scifact/
  corpus.jsonl
  queries.jsonl
  qrels/test.tsv

Candidate TSV rows must start with query_id and document_id:

query_id    document_id    rank

Run a live Azure comparison and write a JSON artifact:

python scripts/compare_reranking.py \
  --dataset beir-scifact \
  --cache-dir .benchmark-cache/scifact \
  --split test \
  --candidates path/to/candidates.tsv \
  --algorithm all \
  --top-k 10 \
  --window-size 20 \
  --stride 10 \
  --output benchmark-results/scifact.json \
  --allow-live

The JSON report includes per-query metrics and macro-averaged NDCG@k, MRR@k, and Recall@k. Raw benchmark artifacts are intentionally ignored by git; publish only reviewed summaries. The committed smoke fixture currently verifies the deterministic offline RankGPT path at NDCG@3, MRR@3, and Recall@3 = 1.000.

Call accounting

compare_reranking.py estimates and prints the number of live LLM reranking calls before execution. The count depends on the number of benchmark cases, the selected algorithms, window_size, stride, passes, and candidate count per query:

rankgpt_sliding_window: one LLM call per back-to-front RankGPT window.
prp_sliding_k: 2 * passes * max(document_count - 1, 0) pairwise LLM calls per query.

The runner does not create first-stage candidates, embeddings, or communities. If your candidate TSV is produced by an upstream retrieval or community-building pipeline, account for those calls separately. A typical full pipeline has two cost surfaces:

Candidate generation: embedding calls for corpus/query vectors, plus any LLM calls used to create or summarize communities.
Reranking: LLM calls made by ranksmith for the selected reranking algorithms.

Benchmark summaries should report both numbers when community retrieval is part of the experiment, for example: embedding calls=<n>, community LLM calls=<n>, and reranking LLM calls=<n>.

Result Model

result.document        # Document
result.rank            # 1-based rank
result.original_index  # 0-based input index
result.metadata        # strategy-specific metadata

Error Handling

ranksmith fails fast. It does not silently truncate long documents, repair invalid rankings, or return unvalidated LLM output.

from ranksmith import DocumentTooLongError, RerankParseError, RerankProviderError

try:
    results = reranker.rerank("query", documents)
except DocumentTooLongError:
    ...
except RerankParseError:
    ...
except RerankProviderError:
    ...

MTEB Reranking Reference Evaluation

These results are intended as practical reference points, not a universal ranking. Results depend on dataset, model, candidate count, latency budget, and invalid output rate. This benchmark measures reranking over fixed native MTEB candidate sets, not first-stage retrieval.

uv run python scripts/evaluate_mteb_reranking.py \
  --tasks AskUbuntuDupQuestions SciDocsRR StackOverflowDupQuestions \
  --methods original rankgpt_sliding_window@20 prp_sliding_k@20 \
  --output-dir benchmark-results/mteb-reranking/example \
  --max-queries 50 \
  --max-document-chars 4000 \
  --shuffle-candidates --shuffle-seed 13 \
  --rankgpt-window-size 20 --rankgpt-step 10 \
  --prp-passes 10 \
  --concurrency 4 \
  --input-token-price-per-1m 2.50 \
  --output-token-price-per-1m 10.00 \
  --allow-live

Current MTEB snapshot

The committed reference snapshot below is from benchmark-results/mteb-reranking/n30-ask-fixed.

Scope:

Task: AskUbuntuDupQuestions
Split: test
Queries: 30
Candidate order: shuffled with seed 13
Max document length: 4000 characters
Validation: strict JSON validation, invalid outputs score 0
Measured methods: original, rankgpt_sliding_window@20

Method	NDCG@10	MRR@10	MAP	Recall@10	p50 latency	p95 latency	Invalid rate	Queries
`original`	0.4431	0.5668	0.3895	0.5871	0.0 ms	0.0 ms	0.000	30
`rankgpt_sliding_window@20`	0.6825	0.6753	0.6424	0.7870	1953.3 ms	2893.9 ms	0.000	30

On this small snapshot, rankgpt_sliding_window@20 improved NDCG@10 and Recall@10 over the original candidate order. This is not a general claim about all datasets; it is a smoke-sized reference result for this task and configuration.

PRP vs RankGPT Snapshot

The PRP comparison run below uses the same AskUbuntuDupQuestions setup and is saved under benchmark-results/mteb-reranking/n30-prp-vs-rankgpt-rerun. This is a native MTEB candidate-set benchmark: this task exposes 20 candidates per query, so it is not the standard top-100 RankGPT setting.

Method	NDCG@10	MRR@10	MAP	Recall@10	p50 latency	p95 latency	Invalid rate	LLM calls/query	Total LLM calls	Mean cost/query	Queries
`original`	0.4431	0.5668	0.3895	0.5871	0.0 ms	0.0 ms	0.000	0	0	-	30
`rankgpt_sliding_window@20`	0.6830	0.6834	0.6400	0.7706	1842.6 ms	2542.6 ms	0.033	1	30	$0.001530	30
`prp_sliding_k@20`	0.6714	0.7837	0.6132	0.7451	213583.6 ms	230670.9 ms	0.000	380	11,400	$0.172772	30

RankGPT listwise led on NDCG@10, MAP, Recall@10, latency, and cost. PRP led on MRR@10, but it required about 380 pairwise LLM calls per query with passes=10 and 20 candidates. Strict validation is applied: the RankGPT row includes one invalid LLM output scored as zero.

For the common top-100 RankGPT setup with window_size=20 and step=10, rankgpt_sliding_window@100 would use 9 listwise LLM calls per query. The matching prp_sliding_k@100 setting would use 2 * 10 * (100 - 1) = 1,980 pairwise LLM calls per query.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

skiiwoo

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.1

Jun 1, 2026

0.5.0

May 29, 2026

0.4.0

May 20, 2026

0.3.2

May 20, 2026

0.3.1

May 20, 2026

0.3.0

May 20, 2026

This version

0.2.0

May 19, 2026

0.1.0

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ranksmith-0.2.0.tar.gz (1.8 MB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ranksmith-0.2.0-py3-none-any.whl (19.7 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file ranksmith-0.2.0.tar.gz.

File metadata

Download URL: ranksmith-0.2.0.tar.gz
Upload date: May 19, 2026
Size: 1.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ranksmith-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`2daf784143e5688a346d5e3812e3f2cc8782f79cabfd04dcf8a623fb523fe06a`
MD5	`af02ff93460e614c2cbef5be9d4ac608`
BLAKE2b-256	`d96f12401f63da8a096a20aa0363b9e395363dd66e5c3b18ae630b8e73739d95`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ranksmith-0.2.0.tar.gz:

Publisher: ci.yml on pko89403/ranksmith

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ranksmith-0.2.0.tar.gz
- Subject digest: 2daf784143e5688a346d5e3812e3f2cc8782f79cabfd04dcf8a623fb523fe06a
- Sigstore transparency entry: 1572486984
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: pko89403/ranksmith@01a016f8a35f3920aafa60278d5190d51d4c3058
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/pko89403
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@01a016f8a35f3920aafa60278d5190d51d4c3058
- Trigger Event: push

File details

Details for the file ranksmith-0.2.0-py3-none-any.whl.

File metadata

Download URL: ranksmith-0.2.0-py3-none-any.whl
Upload date: May 19, 2026
Size: 19.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ranksmith-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cbb16259353cb37b9c04710f3669ce058261e46b0e361f9228981e0e7eec584d`
MD5	`e42472b0cb1040d806cb4a7339cf1c59`
BLAKE2b-256	`c3f4b2a92ebb7f2b957fcac2a18f57761dd99f0fcd9730df775d429a5ff67250`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ranksmith-0.2.0-py3-none-any.whl:

Publisher: ci.yml on pko89403/ranksmith

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ranksmith-0.2.0-py3-none-any.whl
- Subject digest: cbb16259353cb37b9c04710f3669ce058261e46b0e361f9228981e0e7eec584d
- Sigstore transparency entry: 1572486997
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: pko89403/ranksmith@01a016f8a35f3920aafa60278d5190d51d4c3058
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/pko89403
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@01a016f8a35f3920aafa60278d5190d51d4c3058
- Trigger Event: push

ranksmith 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ranksmith

Install

Quick Start

Supported Strategies & Algorithms

1. ListwiseStrategy (RankGPT)

2. PairwiseStrategy (PRP)

How to Apply a Strategy

Async Support

Examples

Benchmarking

Call accounting

Result Model

Error Handling

MTEB Reranking Reference Evaluation

Current MTEB snapshot

PRP vs RankGPT Snapshot

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance