CoREB: Code Retrieval and Reranking Benchmark — a graded-relevance benchmark for code retrieval and reranking models

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

iLampard

These details have not been verified by PyPI

Project links

Dataset

Project description

CoREB: Code Retrieval and Reranking Benchmark

CoREB is a graded-relevance benchmark for evaluating code retrieval and reranking models across three tasks:

Task	Query	Target	Example
Text-to-Code (T2C)	Natural language description	Code solution	"Find the longest substring without repeating characters" → Python solution
Code-to-Code (C2C)	Code in language A	Equivalent code in language B	Python solution → Java translation
Code-to-Text (C2T)	Code snippet	Problem description	Python solution → problem statement

Key Features

Graded relevance: 3-level qrel scheme (rel=2: positive, rel=1: hard negative, rel=0: irrelevant) — hard negatives are same-problem distractors that penalize nDCG when retrieved above true positives
5 programming languages: Python, C++, Java, Go, Ruby
Problem-disjoint train/test splits: v202602 (training) and v202603 (testing) cover non-overlapping contest windows
Drop-in evaluation: compatible with standard IR evaluation (pytrec_eval) with relevance_level=2

Installation

pip install coreb

For HuggingFace model support:

pip install coreb[hf]        # transformers backend
pip install coreb[gemini]    # Google Gemini API
pip install coreb[all]       # everything

Quick Start

Load the Dataset

from datasets import load_dataset

# Load v202603 release (latest)
code_corpus = load_dataset("hq-bench/coreb", "code_corpus", split="release_v2603")
text_corpus = load_dataset("hq-bench/coreb", "text_corpus", split="release_v2603")

# Load task-specific queries and qrels
t2c_queries = load_dataset("hq-bench/coreb", "text2code_queries", split="release_v2603")
t2c_qrels = load_dataset("hq-bench/coreb", "text2code_qrels", split="release_v2603")

print(f"Code corpus: {len(code_corpus)} documents")
print(f"T2C queries: {len(t2c_queries)} queries, {len(t2c_qrels)} qrels")

Run Evaluation

from coreb_runner.benchmark import (
    load_jsonl,
    convert_corpus_to_coir_format,
    convert_queries_to_coir_format,
    convert_qrels_to_coir_format,
    EvaluateRetrieval,
    DenseRetrievalExactSearch,
    create_model_wrapper,
)

# Load data (from local JSONL files or convert from HF datasets)
corpus = convert_corpus_to_coir_format(load_jsonl("code_corpus.jsonl"))
queries = convert_queries_to_coir_format(load_jsonl("text2code_queries.jsonl"))
qrels = convert_qrels_to_coir_format(load_jsonl("text2code_qrels.jsonl"))

# Create model wrapper
model = create_model_wrapper("jinaai/jina-embeddings-v3", model_type="huggingface")

# Run retrieval + evaluation
retriever = DenseRetrievalExactSearch(model, batch_size=64)
evaluator = EvaluateRetrieval(retriever, k_values=[1, 3, 5, 10])
results = evaluator.retrieve(corpus, queries)
ndcg, _map, recall, precision = evaluator.evaluate(qrels, results, evaluator.k_values)

print(f"nDCG@10: {ndcg['NDCG@10']:.4f}")
print(f"Recall@10: {recall['Recall@10']:.4f}")

Evaluation with Graded Relevance

CoREB uses relevance_level=2 — only rel>=2 items count as relevant for binary metrics (Recall, MAP, Precision). Hard negatives (rel=1) penalize nDCG by occupying top ranks with zero gain but do not inflate Recall/MRR.

# The EvaluateRetrieval class handles this automatically:
# - rel=1 (hard negatives) are zeroed out for nDCG computation
# - relevance_level=2 is set for pytrec_eval binary metrics
print(f"Relevance threshold: {EvaluateRetrieval.RELEVANCE_LEVEL}")  # 2

Dataset Structure

Available on HuggingFace: hq-bench/coreb

8 configs x 2 splits (release_v2602, release_v2603):

Config	v2603 Rows	Description
`code_corpus`	1,744	Code solutions (5 languages, 2 generator models)
`text_corpus`	875	Problem descriptions (175 original + 700 LLM noise)
`text2code_queries`	1,123	T2C queries (canonical, full, search subtasks)
`text2code_qrels`	5,950	T2C relevance judgments (2,814 pos + 3,136 hard neg)
`code2code_queries`	278	C2C queries (cross-language)
`code2code_qrels`	1,457	C2C relevance judgments (623 pos + 834 hard neg)
`code2text_queries`	1,200	C2T queries (canonical, full, match subtasks)
`code2text_qrels`	4,610	C2T relevance judgments (820 pos + 2,650 hard neg)

Benchmark Results (v202603, nDCG@10)

Rank	Model	Avg	T2C	C2C	C2T
1	GemEmb-2	0.639	0.434	0.698	0.784
2	C2LLM-7B	0.623	0.443	0.659	0.766
3	jina-code-1.5b	0.607	0.414	0.671	0.735
4	C2LLM-0.5B	0.604	0.430	0.657	0.725
5	jina-code-0.5b	0.596	0.386	0.677	0.725
6	F2LLM-4B	0.547	0.407	0.500	0.735
7	Qwen3-Emb-4B	0.495	0.390	0.392	0.704
8	F2LLM-1.7B	0.485	0.383	0.383	0.690
9	Qwen3-Emb-0.6B	0.443	0.349	0.384	0.597
10	F2LLM-0.6B	0.439	0.344	0.334	0.641
11	Qwen3-Emb-8B	0.428	0.328	0.320	0.635

Citation

Coming soon.

License

This project is licensed under the Apache License 2.0 — see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

iLampard

These details have not been verified by PyPI

Project links

Dataset

Release history Release notifications | RSS feed

This version

0.1.0

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coreb-0.1.0.tar.gz (118.6 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

coreb-0.1.0-py3-none-any.whl (129.5 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file coreb-0.1.0.tar.gz.

File metadata

Download URL: coreb-0.1.0.tar.gz
Upload date: May 5, 2026
Size: 118.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for coreb-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9f44a42102dd348b5b54069a0ebc15c039d78cc275d13f7ea1459dc70801779e`
MD5	`abda193122626a2beb5dbee38d29518d`
BLAKE2b-256	`cfe6a5f6a51c61275bbdd9a3be38ddc12a2fed39e1c217b0214fe7cad0d1f677`

See more details on using hashes here.

Provenance

The following attestation bundles were made for coreb-0.1.0.tar.gz:

Publisher: publish.yml on hq-bench/coreb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: coreb-0.1.0.tar.gz
- Subject digest: 9f44a42102dd348b5b54069a0ebc15c039d78cc275d13f7ea1459dc70801779e
- Sigstore transparency entry: 1440196576
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: hq-bench/coreb@d46f85e1d12b50b98f2cfe2844dcd310c55c12b6
- Branch / Tag: refs/tags/0.1.0
- Owner: https://github.com/hq-bench
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d46f85e1d12b50b98f2cfe2844dcd310c55c12b6
- Trigger Event: release

File details

Details for the file coreb-0.1.0-py3-none-any.whl.

File metadata

Download URL: coreb-0.1.0-py3-none-any.whl
Upload date: May 5, 2026
Size: 129.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for coreb-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2dabbf6a805800e3e20faa6a294fad650d4b4282af55021a2e0d310d9f78d20b`
MD5	`f7463f3bfaa9f791e2c1eca3e9131dae`
BLAKE2b-256	`9b67bd33b1bc5a9a35dfbef397c9c02892157702a4f4945d577b182aea5abc69`

See more details on using hashes here.

Provenance

The following attestation bundles were made for coreb-0.1.0-py3-none-any.whl:

Publisher: publish.yml on hq-bench/coreb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: coreb-0.1.0-py3-none-any.whl
- Subject digest: 2dabbf6a805800e3e20faa6a294fad650d4b4282af55021a2e0d310d9f78d20b
- Sigstore transparency entry: 1440196626
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: hq-bench/coreb@d46f85e1d12b50b98f2cfe2844dcd310c55c12b6
- Branch / Tag: refs/tags/0.1.0
- Owner: https://github.com/hq-bench
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d46f85e1d12b50b98f2cfe2844dcd310c55c12b6
- Trigger Event: release

coreb 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CoREB: Code Retrieval and Reranking Benchmark

Key Features

Installation

Quick Start

Load the Dataset

Run Evaluation

Evaluation with Graded Relevance

Dataset Structure

Benchmark Results (v202603, nDCG@10)

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance