labelbank

Retrieve + rerank over a closed label bank: LLM bi-encoders with self-mined hard negatives and a generative listwise reranker

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lidaoyuan

These details have not been verified by PyPI

Project description

labelbank — retrieve and rerank over a closed label bank: LLM bi-encoders, self-mined hard negatives, generative listwise reranking. Kaggle Silver, Eedi.

Python

labelbank is the generalized core of a silver-medal (top 5%) solution to Kaggle's Eedi — Mining Misconceptions in Mathematics, extracted into a small, tested library you can run on your own label catalog with any Hugging Face backbone. The exact competition artifacts are preserved untouched in competition/, and golden tests pin the library's default behavior to the medal-winning code byte for byte.

Use it when your problem looks like this: given a piece of free text, find the matching entry in a fixed catalog of labels — a few hundred to a few tens of thousands of entries that all look frustratingly similar. Support tickets → known-issue KB, error logs → root-cause catalog, symptoms → diagnosis codes, content → policy categories, student mistakes → misconception taxonomies (the original task: 2,587 fine-grained math misconceptions).

Why not just an off-the-shelf embedding model?

Generic embedders retrieve "something related". In a fine-grained bank, related isn't enough — "ignores order of operations" and "evaluates left to right" are nearly identical sentences and different labels. Three design choices close that gap, and they are exactly what this library packages:

1. No in-batch negatives — mined pools instead. Standard contrastive recipes use other in-batch examples as negatives. In a closed bank that's poison: another query's positive is often a sibling label of your gold (a false negative), and random negatives are trivially easy. labelbank trains on explicit per-query pools — [gold, hard negatives…] — with cross-entropy over the group (no_in_batch_neg_loss, temperature 0.01).

2. The hard negatives come from the model itself. Train round N → rank the whole bank for every training query → take each query's own top-k as round N+1's negative pool, gold forced to the front (gold_first_pool). A self-bootstrapping curriculum: every round, the negatives are precisely the mistakes the current model still makes. This loop was decisive for the medal.

flowchart LR
    T["labeled pairs<br>(text → label id)"] --> R1["bi-encoder round N<br>(LoRA fine-tune)"]
    R1 -- "rank full bank<br>per training query" --> M["top-k pools<br>gold first"]
    M -- "hard negatives" --> R2["bi-encoder round N+1"]
    R2 -- "top-k candidates" --> RR["generative listwise reranker<br>(letters A–E, completion-only SFT)"]
    RR --> O["final ranking"]

3. A generative listwise reranker with no position prior. The retriever's top-k candidates are inlined into one prompt as lettered options; a causal LLM is fine-tuned (completion-only) to answer the letter. The gold's position is shuffled at training time — the reranker must judge content, not slot — and at inference the next-token logits over A…E re-order the candidates (ListwiseReranker).

Install

pip install labelbank              # core: metrics, mining, formatting, data (no torch)
pip install labelbank[retrieve]    # + bi-encoder retrieval (torch, transformers, peft)
pip install labelbank[rerank]      # + the generative listwise reranker (adds trl)
pip install labelbank[train]       # everything needed to train both stages

60 seconds

from labelbank import LabelBank, BiEncoderRetriever, gold_first_pool

# 1. Your closed catalog, and some labeled (text -> label id) pairs.
bank = LabelBank.from_csv("catalog.csv", id_col="LabelId", text_col="LabelText")
queries = ["my failing log line…", "another report…"]   # free text
gold_ids = [1042, 17]                                    # matching catalog ids

# 2. Retrieve with any HF backbone (last-token pooling + L2 norm).
retriever = BiEncoderRetriever.from_pretrained(
    "Qwen/Qwen2.5-0.5B-Instruct", trainable=True,
    query_prefix="<instruct>Match the text to the best catalog entry.\n<query>",
)
ranked = retriever.retrieve(queries, bank, top_k=25)

# 3. Mine hard negatives from the model's own rankings, then retrain.
pools = [gold_first_pool(r, g, top_k=25) for r, g in zip(ranked, gold_ids)]

from labelbank import RetrieverTrainConfig, train_retriever
train_retriever(retriever, queries, [bank.texts_of(p) for p in pools],
                RetrieverTrainConfig(epochs=1, temperature=0.01))

# 4. Evaluate against the whole bank.
metrics = retriever.evaluate(queries, gold_ids, bank)   # map@25 + recall@{1,10,25,50,100}

Rerank the top-5 with a generative judge:

from labelbank import build_training_rows, ListwiseReranker

rows = build_training_rows(queries, candidate_texts, gold_texts, k=5)   # gold position shuffled
reranker = ListwiseReranker.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
reranker.train(rows, output_dir="out/reranker", lora={"r": 16})
order = reranker.rerank(query_text, candidate_texts)                     # letter-logit reorder

Or run the whole loop — zero-shot eval → rank the bank → mine gold-first pools → retrain → re-evaluate, for mining_rounds rounds — from one YAML:

python -m labelbank.run --cfg examples/configs/quickstart.yaml             # 0.5B, one consumer GPU
python -m labelbank.run --cfg examples/configs/reproduce_competition.yaml  # the medal setup (32B + NF4)

The retriever stage writes the adapter, per-split rankings.parquet and metrics.json to output_dir; the reranker stage (stage: reranker) consumes that parquet and trains the listwise judge on it.

Measured: do mined negatives beat random ones?

The library's central claim, measured end to end through its public API on a public dataset — banking77 (a real closed bank of 77 customer intents), Qwen2.5-0.5B-Instruct + LoRA bi-encoder, 2,000 training pairs, 1,000 held-out test queries, pools of 8, one epoch per arm, one RTX 4080, ~1 h (examples/mined_negatives_experiment.py):

arm (identical budgets)	MAP@25	R@1	R@3	R@5	R@10
zero-shot backbone	0.069	1.9%	6.0%	9.7%	17.2%
random negatives (bootstrap round)	0.788	67.6%	87.8%	94.5%	97.6%
+ self-mined, round 1	0.838	76.2%	89.6%	93.3%	97.5%
+ self-mined, round 2	0.839	75.7%	90.5%	95.0%	97.9%

Mining is worth +5.0 MAP and +8.6 points of R@1 over random negatives at the same budget — and the gain concentrates exactly where fine-grained banks hurt: top-1, where sibling labels collide (R@10 is saturated for both). Round 2 plateaus on this small bank; the competition iterated rounds over a 2,587-entry bank (next section).

One honest caveat the ablation makes measurable: hard negatives are only as good as the model that mines them. Mining round 1 from the zero-shot model's rankings instead of the bootstrap model's collapses to MAP 0.430 — far below plain random negatives. That is why the pipeline (and the competition protocol preserved in competition/) trains a bootstrap round first and mines from it. Reproduce both:

pip install -e .[retrieve] datasets
python examples/mined_negatives_experiment.py               # bootstrap protocol (table above)
python examples/mined_negatives_experiment.py --cold-start  # the ablation: mine from zero-shot

Measured: the competition run

Numbers from the preserved training logs (competition/stage1_train.log) — retriever stage, Qwen2.5-32B-Instruct + LoRA over a 2,587-entry bank, scored on held-out fold:

metric	value
MAP@25	0.4238
Recall@1	0.3017
Recall@10	0.6906
Recall@25	0.8126
Recall@50	0.8978
Recall@100	0.9391

With the listwise reranker on top, the full two-stage system scored 0.50 on the private leaderboard — silver medal, top 5%. For intuition: Recall@25 of 0.81 means the retriever alone puts the right label among 25 candidates four times out of five — out of 2,587 that all describe subtly different math mistakes.

How it relates to existing tools

	sentence-transformers / BGE	RAG over a corpus	`labelbank`
Target	open-ended similarity	open document collection	closed catalog (can re-embed every eval)
Negatives	in-batch by default	n/a	explicit mined pools, no in-batch
Mining loop	bring your own	n/a	built in, gold-first, iterative
Reranker	cross-encoder (pointwise)	LLM reads retrieved docs	generative listwise letters, position-shuffled
Backbone	encoder models	any	any HF causal model as bi-encoder (last-token pool, LoRA, 4-bit)

If you need general-purpose embeddings, use sentence-transformers. If your labels are a fixed, fine-grained catalog and generic embeddings keep confusing siblings, this is the recipe that medaled on exactly that problem.

Provenance & validation

The competition scripts, configs, training logs, inference notebook, certificate and the full original write-up are preserved verbatim in competition/.
Golden tests pin the library to the medal-winning code: the contrastive loss, last-token pooling, hard-negative pool construction, both prompt templates, and the Eedi data pipeline are each fuzz-tested against verbatim copies of the originals (tests/reference_impl.py) and assert identical output — the library is the competition code, not a reimplementation of it.
Final result: silver medal (top 5%), private LB 0.50 (certificate).

Citation

@misc{li2024labelbank,
  author = {Daoyuan Li},
  title  = {labelbank: retrieval and listwise reranking over closed label banks with self-mined hard negatives},
  year   = {2024},
  url    = {https://github.com/DaoyuanLi2816/labelbank},
  note   = {Generalized from a silver-medal solution, Kaggle Eedi — Mining Misconceptions in Mathematics}
}

License

MIT — see LICENSE.

Author

Daoyuan Li — Kaggle (distiller) · lidaoyuan2816@gmail.com

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lidaoyuan

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 11, 2026

0.1.0

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labelbank-0.2.0.tar.gz (31.3 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

labelbank-0.2.0-py3-none-any.whl (25.7 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file labelbank-0.2.0.tar.gz.

File metadata

Download URL: labelbank-0.2.0.tar.gz
Upload date: Jun 11, 2026
Size: 31.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for labelbank-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`94a76da869e2ed50c9c596329e353f620a60f268909925d5982465bb514e3f40`
MD5	`c67a4eb549a58023ad8d6395b70a20a4`
BLAKE2b-256	`0f35980fa91e1bcb11c316cf7d8215bb460a7f26e3db7a20acb2e122a486c20e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for labelbank-0.2.0.tar.gz:

Publisher: release.yml on DaoyuanLi2816/labelbank

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: labelbank-0.2.0.tar.gz
- Subject digest: 94a76da869e2ed50c9c596329e353f620a60f268909925d5982465bb514e3f40
- Sigstore transparency entry: 1787468733
- Sigstore integration time: Jun 11, 2026
Source repository:
- Permalink: DaoyuanLi2816/labelbank@d0b3ba2aebb85f5c827a8f227a31ad7241c6be14
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/DaoyuanLi2816
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d0b3ba2aebb85f5c827a8f227a31ad7241c6be14
- Trigger Event: release

File details

Details for the file labelbank-0.2.0-py3-none-any.whl.

File metadata

Download URL: labelbank-0.2.0-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for labelbank-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f9fc9cf695101081d5afb6685e68a315b563e11f59833981fad0d9d8ddee69a6`
MD5	`1d6b26634e668d39a1d378317ebccebf`
BLAKE2b-256	`514d814f46c2fee2ab18dcd634de109c52b507fc4739762a921de6712d75d097`

See more details on using hashes here.

Provenance

The following attestation bundles were made for labelbank-0.2.0-py3-none-any.whl:

Publisher: release.yml on DaoyuanLi2816/labelbank

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: labelbank-0.2.0-py3-none-any.whl
- Subject digest: f9fc9cf695101081d5afb6685e68a315b563e11f59833981fad0d9d8ddee69a6
- Sigstore transparency entry: 1787469709
- Sigstore integration time: Jun 11, 2026
Source repository:
- Permalink: DaoyuanLi2816/labelbank@d0b3ba2aebb85f5c827a8f227a31ad7241c6be14
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/DaoyuanLi2816
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d0b3ba2aebb85f5c827a8f227a31ad7241c6be14
- Trigger Event: release

labelbank 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Why not just an off-the-shelf embedding model?

Install

60 seconds

Measured: do mined negatives beat random ones?

Measured: the competition run

How it relates to existing tools

Provenance & validation

Citation

License

Author

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance