beir · PyPI

A Heterogeneous Benchmark for Information Retrieval

These details have not been verified by PyPI

Project links

Project description

Paper | Installation | Quick Example | Datasets | Wiki | Hugging Face

:beers: What is it?

BEIR is a heterogeneous benchmark containing diverse IR tasks. It also provides a common and easy framework for evaluation of your NLP-based retrieval models within the benchmark.

For an overview, checkout our new wiki page: https://github.com/beir-cellar/beir/wiki.

For models and datasets, checkout out Hugging Face (HF) page: https://huggingface.co/BeIR.

For more information, checkout out our publications:

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models (NeurIPS 2021, Datasets and Benchmarks Track)
Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard (SIGIR 2024 Resource Track)

:beers: Installation

Install via pip:

pip install beir

If you want to build from source, use:

$ git clone https://github.com/beir-cellar/beir.git
$ cd beir
$ pip install -e .

Tested with python versions 3.9+

:beers: Features

Preprocess your own IR dataset or use one of the already-preprocessed 17 benchmark datasets
Wide settings included, covers diverse benchmarks useful for both academia and industry
Evaluates well-known retrieval architectures (lexical, dense, sparse and reranking-based)
Add and evaluate your own model in a easy framework using different state-of-the-art evaluation metrics

:beers: Quick Example

For other example codes, please refer to our Examples and Tutorials Wiki page.

Quick Example with Sentence-BERT

from beir import util, LoggingHandler
from beir.retrieval import models
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.evaluation import EvaluateRetrieval
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES

import logging
import pathlib, os

#### Just some code to print debug information to stdout
logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.INFO,
                    handlers=[LoggingHandler()])
#### /print debug information to stdout

#### Download scifact.zip dataset and unzip the dataset
dataset = "scifact"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{dataset}.zip"
out_dir = os.path.join(pathlib.Path(__file__).parent.absolute(), "datasets")
data_path = util.download_and_unzip(url, out_dir)

#### Provide the data_path where scifact has been downloaded and unzipped
corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split="test")

#### Load the SBERT model and retrieve using cosine-similarity
model = DRES(models.SentenceBERT("Alibaba-NLP/gte-modernbert-base"), batch_size=16)

retriever = EvaluateRetrieval(model, score_function="cos_sim") # or "dot" for dot product
results = retriever.retrieve(corpus, queries)

#### Evaluate your model with NDCG@k, MAP@K, Recall@K and Precision@K  where k = [1,3,5,10,100,1000]
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
mrr = retriever.evaluate_custom(qrels, results, retriever.k_values, metric="mrr")

### If you want to save your results and runfile (useful for reranking)
results_dir = os.path.join(pathlib.Path(__file__).parent.absolute(), "results")
os.makedirs(results_dir, exist_ok=True)

#### Save the evaluation runfile & results
util.save_runfile(os.path.join(results_dir, f"{dataset}.run.trec"), results)
util.save_results(os.path.join(results_dir, f"{dataset}.json"), ndcg, _map, recall, precision, mrr)

Quick Example with LoRA & vLLM

First install peft, vllm & accelerate using the following installations:

pip install peft
pip install accelerate
pip install vllm

from beir import util, LoggingHandler
from beir.retrieval import models
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.evaluation import EvaluateRetrieval
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES

import logging
import pathlib, os

#### Just some code to print debug information to stdout
logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.INFO,
                    handlers=[LoggingHandler()])
#### /print debug information to stdout

#### Download scifact.zip dataset and unzip the dataset
dataset = "scifact"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{dataset}.zip"
out_dir = os.path.join(pathlib.Path(__file__).parent.absolute(), "datasets")
data_path = util.download_and_unzip(url, out_dir)

#### Provide the data_path where scifact has been downloaded and unzipped
corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split="test")

#### You can also merge the LoRA model weights into the original base model for faster inference.
#### Checkout: https://github.com/beir-cellar/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_lora_vllm.py

#### Load the vLLM embed model and retrieve using cosine-similarity
model = DRES(
    models.VLLMEmbed(
        model_path="Qwen/Qwen2.5-7B",
        lora_name_or_path="rlhn/Qwen2.5-7B-rlhn-400K",
        max_length=512,
        lora_r=16,
        pooling="eos",
        append_eos_token=True,
        normalize=True,
        prompts={"query": "query: ", "passage": "passage: "},
        convert_to_numpy=True
    ),
    batch_size=128,
)

retriever = EvaluateRetrieval(model, score_function="cos_sim") # or "dot" for dot product
results = retriever.encode_and_retrieve(corpus, queries, encode_output_path="./qwen_embeddings/")

#### Evaluate your model with NDCG@k, MAP@K, Recall@K and Precision@K  where k = [1,3,5,10,100,1000]
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
mrr = retriever.evaluate_custom(qrels, results, retriever.k_values, metric="mrr")

### If you want to save your results and runfile (useful for reranking)
results_dir = os.path.join(pathlib.Path(__file__).parent.absolute(), "results")
os.makedirs(results_dir, exist_ok=True)

#### Save the evaluation runfile & results
util.save_runfile(os.path.join(results_dir, f"{dataset}.run.trec"), results)
util.save_results(os.path.join(results_dir, f"{dataset}.json"), ndcg, _map, recall, precision, mrr)

Quick Example with HuggingFace

if you use `encode_and_retrieve()` make sure you install faiss with `pip install faiss-cpu`.

from beir import util, LoggingHandler
from beir.retrieval import models
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.evaluation import EvaluateRetrieval
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES

import logging
import pathlib, os

#### Just some code to print debug information to stdout
logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.INFO,
                    handlers=[LoggingHandler()])
#### /print debug information to stdout

#### Download scifact.zip dataset and unzip the dataset
dataset = "scifact"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{dataset}.zip"
out_dir = os.path.join(pathlib.Path(__file__).parent.absolute(), "datasets")
data_path = util.download_and_unzip(url, out_dir)

#### Provide the data_path where scifact has been downloaded and unzipped
corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split="test")

#### Load the Huggingface model and retrieve using cosine-similarity
query_prompt = "Instruct: Given a question, retrieve relevant documents that best answer the question\nQuery: "

model = DRES(
    models.HuggingFace(
        model_path="intfloat/e5-mistral-7b-instruct",
        max_length=512,
        pooling="eos",
        append_eos_token=True,
        normalize=True,
        prompts={"query": query_prompt, "passage": ""},
        attn_implementation="flash_attention_2",
        torch_dtype="bfloat16"
    ),
    batch_size=128,
)

retriever = EvaluateRetrieval(model, score_function="cos_sim") # or "dot" for dot product
results = retriever.encode_and_retrieve(corpus, queries, encode_output_path="./embeddings/")

#### Evaluate your model with NDCG@k, MAP@K, Recall@K and Precision@K  where k = [1,3,5,10,100,1000]
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
mrr = retriever.evaluate_custom(qrels, results, retriever.k_values, metric="mrr")

### If you want to save your results and runfile (useful for reranking)
results_dir = os.path.join(pathlib.Path(__file__).parent.absolute(), "results")
os.makedirs(results_dir, exist_ok=True)

#### Save the evaluation runfile & results
util.save_runfile(os.path.join(results_dir, f"{dataset}.run.trec"), results)
util.save_results(os.path.join(results_dir, f"{dataset}.json"), ndcg, _map, recall, precision, mrr)

Quick Example with APIs, e.g. Cohere

Install Cohere API using pip install cohere & if you are using encode_and_retrieve() install faiss with pip install faiss-cpu.

from beir import util, LoggingHandler
from beir.retrieval import apis
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.evaluation import EvaluateRetrieval
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES

import logging
import pathlib, os

#### Just some code to print debug information to stdout
logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.INFO,
                    handlers=[LoggingHandler()])
#### /print debug information to stdout

#### Download scifact.zip dataset and unzip the dataset
dataset = "scifact"
url = f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{dataset}.zip"
out_dir = os.path.join(pathlib.Path(__file__).parent.absolute(), "datasets")
data_path = util.download_and_unzip(url, out_dir)

#### Provide the data_path where scifact has been downloaded and unzipped
corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split="test")

cohere_api_key = os.getenv("COHERE_API_KEY")
#### Load the Cohere API Embed model and retrieve using cosine-similarity
model = DRES(
    apis.CohereEmbedAPI(
        api_key=cohere_api_key, 
        model_path="embed-v4.0", 
        normalize=True, 
        torch_dtype="float32"
    ),
    batch_size=96,
)

retriever = EvaluateRetrieval(model, score_function="cos_sim") # or "dot" for dot product
results = retriever.encode_and_retrieve(corpus, queries, encode_output_path="./cohere/embeddings/")

#### Evaluate your model with NDCG@k, MAP@K, Recall@K and Precision@K  where k = [1,3,5,10,100,1000]
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
mrr = retriever.evaluate_custom(qrels, results, retriever.k_values, metric="mrr")

### If you want to save your results and runfile (useful for reranking)
results_dir = os.path.join(pathlib.Path(__file__).parent.absolute(), "results")
os.makedirs(results_dir, exist_ok=True)

#### Save the evaluation runfile & results
util.save_runfile(os.path.join(results_dir, f"{dataset}.run.trec"), results)
util.save_results(os.path.join(results_dir, f"{dataset}.json"), ndcg, _map, recall, precision, mrr)

:beers: Available Datasets

Command to generate md5hash using Terminal: md5sum filename.zip.

You can view all datasets available here or on Hugging Face.

Dataset	Website	BEIR-Name	Public?	Type	Queries	Corpus	Rel D/Q	Down-load	md5
MSMARCO	Homepage	`msmarco`	✅	`train` `dev` `test`	6,980	8.84M	1.1	Link	`444067daf65d982533ea17ebd59501e4`
TREC-COVID	Homepage	`trec-covid`	✅	`test`	50	171K	493.5	Link	`ce62140cb23feb9becf6270d0d1fe6d1`
NFCorpus	Homepage	`nfcorpus`	✅	`train` `dev` `test`	323	3.6K	38.2	Link	`a89dba18a62ef92f7d323ec890a0d38d`
BioASQ	Homepage	`bioasq`	❌	`train` `test`	500	14.91M	4.7	No	How to Reproduce?
NQ	Homepage	`nq`	✅	`train` `test`	3,452	2.68M	1.2	Link	`d4d3d2e48787a744b6f6e691ff534307`
HotpotQA	Homepage	`hotpotqa`	✅	`train` `dev` `test`	7,405	5.23M	2.0	Link	`f412724f78b0d91183a0e86805e16114`
FiQA-2018	Homepage	`fiqa`	✅	`train` `dev` `test`	648	57K	2.6	Link	`17918ed23cd04fb15047f73e6c3bd9d9`
Signal-1M(RT)	Homepage	`signal1m`	❌	`test`	97	2.86M	19.6	No	How to Reproduce?
TREC-NEWS	Homepage	`trec-news`	❌	`test`	57	595K	19.6	No	How to Reproduce?
Robust04	Homepage	`robust04`	❌	`test`	249	528K	69.9	No	How to Reproduce?
ArguAna	Homepage	`arguana`	✅	`test`	1,406	8.67K	1.0	Link	`8ad3e3c2a5867cdced806d6503f29b99`
Touche-2020	Homepage	`webis-touche2020`	✅	`test`	49	382K	19.0	Link	`46f650ba5a527fc69e0a6521c5a23563`
CQADupstack	Homepage	`cqadupstack`	✅	`test`	13,145	457K	1.4	Link	`4e41456d7df8ee7760a7f866133bda78`
Quora	Homepage	`quora`	✅	`dev` `test`	10,000	523K	1.6	Link	`18fb154900ba42a600f84b839c173167`
DBPedia	Homepage	`dbpedia-entity`	✅	`dev` `test`	400	4.63M	38.2	Link	`c2a39eb420a3164af735795df012ac2c`
SCIDOCS	Homepage	`scidocs`	✅	`test`	1,000	25K	4.9	Link	`38121350fc3a4d2f48850f6aff52e4a9`
FEVER	Homepage	`fever`	✅	`train` `dev` `test`	6,666	5.42M	1.2	Link	`5a818580227bfb4b35bb6fa46d9b6c03`
Climate-FEVER	Homepage	`climate-fever`	✅	`test`	1,535	5.42M	3.0	Link	`8b66f0a9126c521bae2bde127b4dc99d`
SciFact	Homepage	`scifact`	✅	`train` `test`	300	5K	1.1	Link	`5f7d1de60b170fc8027bb7898e2efca1`

:beers: Additional Information

We also provide a variety of additional information in our Wiki page. Please refer to these pages for the following:

Quick Start

Datasets

Models

Metrics

Metrics Available

Miscellaneous

:beers: Disclaimer

Similar to Tensorflow datasets or Hugging Face's datasets library, we just downloaded and prepared public datasets. We only distribute these datasets in a specific format, but we do not vouch for their quality or fairness, or claim that you have license to use the dataset. It remains the user's responsibility to determine whether you as a user have permission to use the dataset under the dataset's license and to cite the right owner of the dataset.

If you're a dataset owner and wish to update any part of it, or do not want your dataset to be included in this library, feel free to post an issue here or make a pull request!

If you're a dataset owner and wish to include your dataset or model in this library, feel free to post an issue here or make a pull request!

:beers: Citing & Authors

If you find this repository helpful, feel free to cite our publication BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models:

@inproceedings{
    thakur2021beir,
    title={{BEIR}: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models},
    author={Nandan Thakur and Nils Reimers and Andreas R{\"u}ckl{\'e} and Abhishek Srivastava and Iryna Gurevych},
    booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
    year={2021},
    url={https://openreview.net/forum?id=wCu6T5xFjeJ}
}

If you use any baseline score from the BEIR leaderboard, feel free to cite our publication Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

@inproceedings{kamalloo:2024,
    author = {Kamalloo, Ehsan and Thakur, Nandan and Lassance, Carlos and Ma, Xueguang and Yang, Jheng-Hong and Lin, Jimmy},
    title = {Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses},
    year = {2024},
    isbn = {9798400704314},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3626772.3657862},
    doi = {10.1145/3626772.3657862},
    abstract = {BEIR is a benchmark dataset originally designed for zero-shot evaluation of retrieval models across 18 different domain/task combinations. In recent years, we have witnessed the growing popularity of models based on representation learning, which naturally begs the question: How effective are these models when presented with queries and documents that differ from the training data? While BEIR was designed to answer this question, our work addresses two shortcomings that prevent the benchmark from achieving its full potential: First, the sophistication of modern neural methods and the complexity of current software infrastructure create barriers to entry for newcomers. To this end, we provide reproducible reference implementations that cover learned dense and sparse models. Second, comparisons on BEIR are performed by reducing scores from heterogeneous datasets into a single average that is difficult to interpret. To remedy this, we present meta-analyses focusing on effect sizes across datasets that are able to accurately quantify model differences. By addressing both shortcomings, our work facilitates future explorations in a range of interesting research questions.},
    booktitle = {Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval},
    pages = {1431–1440},
    numpages = {10},
    keywords = {domain generalization, evaluation, reproducibility},
    location = {Washington DC, USA},
    series = {SIGIR '24}
}

The main contributors of this repository are:

Nandan Thakur, Personal Website: thakur-nandan.gitub.io

Contact person: Nandan Thakur, nandant@gmail.com

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

:beers: Collaboration

The BEIR Benchmark has been made possible due to a collaborative effort of the following universities and organizations:

:beers: Contributors

Thanks go to all these wonderful collaborations for their contribution towards the BEIR benchmark:

_{Nandan Thakur}

_{Nils Reimers}

_{Iryna Gurevych}

_{Jimmy Lin}

_{Andreas Rücklé}

_{Abhishek Srivastava}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.2.0

Jun 4, 2025

2.1.0

Feb 25, 2025

2.0.0

Jul 21, 2023

1.0.1

Jun 30, 2022

1.0.0

Mar 21, 2022

0.2.3

Oct 22, 2021

0.2.2

Aug 17, 2021

0.2.1

Jul 19, 2021

0.2.0

Jul 6, 2021

0.1.8

Jun 16, 2021

0.1.7

May 28, 2021

0.1.6

May 26, 2021

0.1.5

May 7, 2021

0.1.3

May 1, 2021

0.1.2

Apr 26, 2021

0.1.1

Apr 20, 2021

0.1.0

Apr 19, 2021

0.0.14

Feb 25, 2021

0.0.13

Feb 16, 2021

0.0.12

Feb 9, 2021

0.0.11

Feb 6, 2021

0.0.10

Feb 2, 2021

0.0.9

Feb 1, 2021

0.0.8

Jan 29, 2021

0.0.7

Jan 29, 2021

0.0.6

Jan 28, 2021

0.0.5

Jan 26, 2021

0.0.4

Jan 25, 2021

0.0.3

Jan 25, 2021

0.0.2

Jan 25, 2021

0.0.1

Jan 25, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

beir-2.2.0.tar.gz (64.0 kB view details)

Uploaded Jun 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

beir-2.2.0-py3-none-any.whl (77.4 kB view details)

Uploaded Jun 4, 2025 Python 3

File details

Details for the file beir-2.2.0.tar.gz.

File metadata

Download URL: beir-2.2.0.tar.gz
Upload date: Jun 4, 2025
Size: 64.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.4

File hashes

Hashes for beir-2.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3bef26652cf9fa209190c3b3b9e9ff684343d66cf39ec637998a6a57e523f786`
MD5	`6190e9dc2b3abbcf4370aebd4eb26d7c`
BLAKE2b-256	`142310e19fa9601fe50c71f65408847e9eccdf3c32a5ed7e382d5bf51de16ebb`

See more details on using hashes here.

File details

Details for the file beir-2.2.0-py3-none-any.whl.

File metadata

Download URL: beir-2.2.0-py3-none-any.whl
Upload date: Jun 4, 2025
Size: 77.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.4

File hashes

Hashes for beir-2.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`46fe4d0a3a6e719090eabc0d1f8aa0c51c6c1379639e6381de5b49e445ab36d1`
MD5	`126ac1d3df7595d9d084284982036cee`
BLAKE2b-256	`8bbcfa7702f4d37e4821a55b26a87f95641ce12d09252bc4202025b34bef44a0`

See more details on using hashes here.

beir 2.2.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Paper | Installation | Quick Example | Datasets | Wiki | Hugging Face

:beers: What is it?

:beers: Installation

:beers: Features

:beers: Quick Example

:beers: Available Datasets

:beers: Additional Information

Quick Start

Datasets

Models

Metrics

Miscellaneous

:beers: Disclaimer

:beers: Citing & Authors

:beers: Collaboration

:beers: Contributors

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes