Fine-tune BAAI/bge-m3 per retrieval + reranking su dominio specifico
Project description
🎯 BGE Auto-Tune
Fine-tune BAAI/bge-m3 on your domain, from scratch.
BGE Auto-Tune automates the entire pipeline: generates a training dataset from your chunks in Qdrant, runs unified fine-tuning (dense + sparse + ColBERT), tests retrieval quality, re-indexes your collections, and publishes the model to HuggingFace.
bge-auto-tune generate → create dataset from Qdrant + local LLM
bge-auto-tune finetune → train the model
bge-auto-tune test → compare base vs fine-tuned
bge-auto-tune reindex → re-embed and update vectors in Qdrant
bge-auto-tune publish → upload model to HuggingFace Hub
bge-auto-tune run → generate + finetune + test in sequence
Prerequisites
BGE Auto-Tune is not a standalone tool — it relies on three services that must be running.
1. Qdrant
A Qdrant instance with a collection populated with text chunks. Chunks must have a text field in their payload (configurable with --text-field).
http://localhost:6333 (default)
2. Local LLM (OpenAI-compatible API)
A language model accessible via an OpenAI-compatible API. Used for two things: filtering low-quality chunks and generating realistic synthetic queries.
Any model served by vLLM, Ollama (with OpenAI endpoint), llama.cpp, TGI, etc. will work.
http://localhost:8001/v1 (default)
Tested with Qwen/Qwen3-4B-Instruct-2507 — 4-8B models work great for this task.
3. BGE-M3 embedding server
A server exposing the BGE-M3 model for generating embeddings. Used to search for hard negatives during dataset generation and to re-index collections after fine-tuning.
http://localhost:8004 (default)
The server must respond on POST /v1/embeddings with the standard format.
Services overview
| Service | Default endpoint | Env var | Purpose |
|---|---|---|---|
| Qdrant | http://localhost:6333 |
QDRANT_URL |
Source of text chunks |
| LLM | http://localhost:8001/v1 |
LLM_URL |
Quality filter + query generation |
| Embedding | http://localhost:8004 |
EMBED_URL |
Hard negatives search + re-indexing |
All endpoints can be configured via env vars or CLI flags.
Hardware
Fine-tuning requires a GPU with at least 16 GB VRAM (24 GB recommended). Dataset generation and testing can run on CPU but are much faster on GPU.
Installation
pip install bge-auto-tune
Recommended workflow (manual pipeline)
If this is your first time, use the individual commands. This lets you inspect and validate each step before moving on.
Step 1 — Generate the dataset
bge-auto-tune generate \
--collection my_docs \
--min-pairs 3000 \
--queries-per-chunk 3 \
--hard-negatives 7
This produces a bge_m3_training.jsonl file. Open it and review it before proceeding:
# How many pairs?
wc -l bge_m3_training.jsonl
# Look at some examples
head -5 bge_m3_training.jsonl | python -m json.tool
Check that:
- Queries are realistic and diverse
- Positives are actually relevant to the query
- Negatives are "hard" (similar but incorrect)
If something looks off, adjust parameters and regenerate. It's much better to spend time here than to fine-tune on dirty data.
How many pairs do you need?
| Pairs | Expected quality |
|---|---|
| < 500 | Nearly useless — too little signal |
| 500 – 1,000 | Works only with a very specific domain and clean data |
| 1,000 – 3,000 | Safe zone for most use cases |
| 3,000 – 10,000 | Ideal — robust results |
| > 10,000 | Diminishing returns |
The absolute number matters less than coverage: if your corpus has 5,000 chunks but the dataset only covers 200 of them, you'll have huge blind spots. Aim to cover at least 30-50% of your chunks.
Step 2 — Fine-tune
bge-auto-tune finetune \
--dataset bge_m3_training.jsonl \
--epochs 2 \
--lr 1e-5 \
--batch-size 4
The model is saved to ./bge-m3-finetuned/.
Note: fine-tuning always starts from the base model
BAAI/bge-m3. If you rerun the command, the previous output is overwritten. To keep different versions, change--output:bge-auto-tune finetune --output ./bge-m3-finetuned-v2
Step 3 — Test
bge-auto-tune test \
--model ./bge-m3-finetuned \
--test-queries 200 \
--top-k 10
The test automatically holds out 20% of the dataset as unseen queries and compares the base model against the fine-tuned one:
- Recall@K — is the positive in the top K results?
- MRR — at what average position does the positive end up?
- NDCG@10 — ranking quality considering position
- Rerank accuracy — multi-mode (dense+sparse+colbert) scoring
- Qualitative examples — queries where fine-tuning improved (or worsened) the ranking
If results aren't good, go back to Step 1: more data, better data, or different parameters.
Step 4 — Re-index Qdrant
After testing, you need to re-embed all documents with the new model. Point your BGE-M3 server to the fine-tuned model, then:
# Test first — creates a separate collection (my_docs_test_finetuned)
bge-auto-tune reindex --collections my_docs --test
# When satisfied, update the original collection in-place
bge-auto-tune reindex --collections my_docs
Multiple collections at once:
bge-auto-tune reindex --collections docs,faq,policies --test
The command iterates all points, reads the text field, calls the embedding service, and updates the vectors. It supports named vectors (dense + sparse), unnamed vectors, and ColBERT.
Step 5 — Publish (optional)
Share the fine-tuned model on HuggingFace Hub:
# Login once
huggingface-cli login
# Publish
bge-auto-tune publish --repo your-user/bge-m3-your-domain
A model card is generated automatically from your test results. Add --private for private repos.
Anyone can then use your model:
from FlagEmbedding import BGEM3FlagModel
model = BGEM3FlagModel("your-user/bge-m3-your-domain")
run command (automatic pipeline)
For users who have already calibrated their parameters and want to run generate → finetune → test in one shot:
bge-auto-tune run \
--collection my_docs \
--min-pairs 3000 \
--epochs 2 \
--lr 1e-5 \
--verbose
If any step fails, it stops. Re-indexing and publishing are not included in run — they require service configuration and confirmation.
⚠️ Use this only when you know your services are working and your parameters are dialed in. For the first time, use the manual pipeline.
All parameters
bge-auto-tune generate
| Parameter | Default | Description |
|---|---|---|
--qdrant-url |
http://localhost:6333 |
Qdrant instance URL |
--collection |
docs |
Collection name |
--text-field |
text |
Payload field containing text |
--llm-url |
http://localhost:8001/v1 |
Local LLM endpoint |
--llm-model |
Qwen/Qwen3-4B-Instruct-2507 |
LLM model name |
--embed-url |
http://localhost:8004 |
BGE-M3 embedding server |
--output |
bge_m3_training.jsonl |
Output dataset file |
--queries-per-chunk |
3 |
Queries generated per chunk |
--hard-negatives |
7 |
Hard negatives per query |
--min-pairs |
2000 |
Minimum target pairs |
--max-chunks |
(all) | Limit chunks processed (for testing) |
--min-chunk-length |
100 |
Minimum chunk length (chars) |
--min-words |
20 |
Minimum words per chunk |
--min-alpha-ratio |
0.4 |
Minimum alphabetic character ratio |
--skip-llm-filter |
false |
Skip LLM quality filter |
--seed |
42 |
Random seed |
--resume |
false |
Resume from partial output |
--vector-name |
(none) | Named vector in Qdrant (e.g. dense) |
bge-auto-tune finetune
| Parameter | Default | Description |
|---|---|---|
--dataset |
bge_m3_training.jsonl |
Input dataset |
--model |
BAAI/bge-m3 |
Base model |
--output |
./bge-m3-finetuned |
Output directory |
--epochs |
2 |
Training epochs |
--lr |
1e-5 |
Learning rate |
--batch-size |
4 |
Batch size per GPU |
--temperature |
0.02 |
InfoNCE temperature |
--warmup-ratio |
0.05 |
Warmup ratio |
--max-passage-len |
1024 |
Max passage length |
--max-query-len |
256 |
Max query length |
--train-group-size |
8 |
Passages per query in batch |
--save-steps |
500 |
Save checkpoint every N steps |
--gradient-checkpointing |
true |
Save VRAM |
bge-auto-tune test
| Parameter | Default | Description |
|---|---|---|
--base-model |
BAAI/bge-m3 |
Base model for comparison |
--model |
./bge-m3-finetuned |
Fine-tuned model |
--dataset |
bge_m3_training.jsonl |
Dataset with query/positive pairs |
--test-queries |
200 |
Number of test queries |
--test-split |
0.2 |
Fraction held out for testing |
--top-k |
10 |
Recall/NDCG cutoff |
--dense-weight |
0.30 |
Dense weight for reranking |
--sparse-weight |
0.65 |
Sparse weight for reranking |
--colbert-weight |
0.05 |
ColBERT weight for reranking |
--batch-size |
16 |
Encoding batch size |
--output |
test_results.json |
Detailed JSON report |
--verbose |
false |
Show all examples |
--device |
auto |
Device: auto, cuda, cpu, mps |
bge-auto-tune reindex
| Parameter | Default | Description |
|---|---|---|
--collections |
(required) | Collections to re-index (comma-separated) |
--qdrant-url |
http://localhost:6333 |
Qdrant instance URL |
--embed-url |
http://localhost:8004 |
BGE-M3 embedding server |
--text-field |
text |
Payload field containing text |
--vectors |
dense,sparse |
Vector types to generate (comma-separated) |
--vector-names |
(auto) | Mapping type:name, e.g. dense:dense,sparse:sparse |
--unnamed-vectors |
false |
Use unnamed vectors (dense only) |
--test |
false |
Create test collection instead of updating in-place |
--embed-batch |
32 |
Batch size for embedding calls |
--scroll-batch |
100 |
Batch size for Qdrant scroll |
--yes / -y |
false |
Skip confirmation prompt |
Re-index examples
# Default: named vectors dense + sparse (most common setup)
bge-auto-tune reindex --collections docs
# Test mode: creates docs_test_finetuned
bge-auto-tune reindex --collections docs --test
# Multiple collections
bge-auto-tune reindex --collections docs,faq,policies
# Only dense, unnamed vectors
bge-auto-tune reindex --collections docs --vectors dense --unnamed-vectors
# Dense + sparse + ColBERT with custom names
bge-auto-tune reindex --collections docs --vectors dense,sparse,colbert \
--vector-names "dense:emb,sparse:lex,colbert:col"
# Different text field
bge-auto-tune reindex --collections docs --text-field content
# Skip confirmation
bge-auto-tune reindex --collections docs -y
bge-auto-tune publish
| Parameter | Default | Description |
|---|---|---|
--model |
./bge-m3-finetuned |
Path to fine-tuned model |
--repo |
(required) | HuggingFace repo (user/model-name) |
--private |
false |
Create private repo |
--branch |
main |
Target branch |
--message |
(auto) | Custom commit message |
--yes / -y |
false |
Skip confirmation prompt |
bge-auto-tune run
Accepts all parameters from generate, finetune, and test. Notable additions:
| Parameter | Default | Description |
|---|---|---|
--model-output |
./bge-m3-finetuned |
Output directory for fine-tuned model |
--verbose |
false |
Detailed output for every step |
Environment variables
All endpoints can be set via env vars so you don't have to repeat them:
export QDRANT_URL=http://10.0.0.1:6333
export QDRANT_COLLECTION=my_docs
export LLM_URL=http://10.0.0.1:8001/v1
export LLM_MODEL=Qwen/Qwen3-8B
export LLM_API_KEY=none
export EMBED_URL=http://10.0.0.1:8004
# Now just:
bge-auto-tune generate
bge-auto-tune finetune
bge-auto-tune test
bge-auto-tune reindex --collections my_docs
Full example: your domain
# 1. Generate 2000 pairs from banking docs
bge-auto-tune generate --collection docs --min-pairs 2000
# 2. Fine-tune for 4 epochs (batch 1 for 24GB GPU)
bge-auto-tune finetune --dataset bge_m3_training.jsonl --epochs 4 --batch-size 1
# 3. Test on held-out queries
bge-auto-tune test --model ./bge-m3-finetuned --test-queries 200
# 4. Point your BGE server to the fine-tuned model, then re-index
bge-auto-tune reindex --collections docs --test # test first
bge-auto-tune reindex --collections docs # production
# 5. Publish to HuggingFace
bge-auto-tune publish --repo your-user/bge-m3-bank-it
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bge_auto_tune-0.1.4-py3-none-any.whl.
File metadata
- Download URL: bge_auto_tune-0.1.4-py3-none-any.whl
- Upload date:
- Size: 34.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4106e200ea01c7b2bd7c491472aa1f8fce4bd15fc3bc1b3b4c31f0ab009d35ec
|
|
| MD5 |
7e8a9c496b7283b51ccd9b93a47facf0
|
|
| BLAKE2b-256 |
70ce11e79423b7f2caa7d3e0004d32a13776e9904ce4ef39047087df4e1b9441
|