Extension for accessing the LongEval test collections via ir_datasets.
Project description
💾 ir-datasets-longeval
Extension for accessing the LongEval datasets via ir_datasets.
Installation
Install the package from PyPI:
pip install ir-datasets-longeval
Usage
The ir_datasets_longeval extension provides an load method that returns a LongEval ir_dataset that allows to load official versions of the LongEval datasets as well as modified versions that you have on your local filesystem:
from ir_datasets_longeval import load
# load an official version of the LongEval dataset.
dataset = load("longeval-web/2022-06")
# load a local copy of a LongEval dataset.
# E.g., so that you can easily run your approach on modified data.
dataset = load("<PATH-TO-A-DIRECTORY-ON-YOUR-MACHINE>")
# From now on, you can use dataset as any ir_dataset
LongEval datasets have a set of temporal specifics that you can use:
# At what time does/did a dataset take place?
dataset.get_timestamp()
# Each dataset can have a list of zero or more past datasets/interactions.
# You can incorporate them in your retrieval system:
for past_dataset in dataset.get_prior_datasets():
# `past_dataset` is an LongEval `ir_dataset` with the same functionality as the `dataset`
past_dataset.get_timestamp()
If you want to use the CLI, just use the ir_datasets_longeval instead of ir_datasets. All CLI commands will work as usual, e.g., to list the officially available datasets:
ir_datasets_longeval list
Citation
If you use this package, please cite the original ir_datasets paper and this extension:
@inproceedings{ir_datasets_longeval,
author = {J{\"{u}}ri Keller and Maik Fr{\"{o}}be and Gijs Hendriksen and Daria Alexander and Martin Potthast and Philipp Schaer},
title = {Simplified Longitudinal Retrieval Experiments: A Case Study on Query Expansion and Document Boosting},
booktitle = {Experimental {IR} Meets Multilinguality, Multimodality, and Interaction - 16th International Conference of the {CLEF} Association, {CLEF} 2024, Madrid, Spain, September 9-12, 2025, Proceedings, Part {I}},
series = {Lecture Notes in Computer Science},
publisher = {Springer},
year = {2025}
}
Development
To build this package and contribute to its development you need to install the build, setuptools, and wheel packages (pre-installed on most systems):
pip install build setuptools wheel
Create and activate a virtual environment:
python3.10 -m venv venv/
source venv/bin/activate
Dependencies
Install the package and test dependencies:
pip install -e .[tests]
Testing
Verify your changes against the test suite to verify.
ruff check . # Code format and LINT
mypy . # Static typing
bandit -c pyproject.toml -r . # Security
pytest . # Unit tests
Please also add tests for your newly developed code.
Build wheels
Wheels for this package can be built with:
python -m build
Support
If you have any problems using this package, please file an issue. We're happy to help!
Fork Notice
This repository is a fork of ir-datasets-clueweb22, originally developed by Jan Heinrich Merker. All credit for the original work goes to him, and this fork retains the original MIT License. The changes made in this fork include an adaptation from the clueweb22 dataset to the LongEval datasets.
License
This repository is released under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ir_datasets_longeval-0.0.11.tar.gz.
File metadata
- Download URL: ir_datasets_longeval-0.0.11.tar.gz
- Upload date:
- Size: 98.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6b55a43d445aa69a1fd3fc984a878295de2d5a6cbad3f9402998ad72a792f5d
|
|
| MD5 |
5371c1bd5a676ed1a72c1fb500d7f1fc
|
|
| BLAKE2b-256 |
86d40d72cbad26027754fb05ff38e782bb039c1e70b37c37a30e3518c17036fb
|
Provenance
The following attestation bundles were made for ir_datasets_longeval-0.0.11.tar.gz:
Publisher:
ci.yml on clef-longeval/ir-datasets-longeval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ir_datasets_longeval-0.0.11.tar.gz -
Subject digest:
b6b55a43d445aa69a1fd3fc984a878295de2d5a6cbad3f9402998ad72a792f5d - Sigstore transparency entry: 776405126
- Sigstore integration time:
-
Permalink:
clef-longeval/ir-datasets-longeval@0a2bab6c2c5cf084052b2c44069381a0cf993ef9 -
Branch / Tag:
refs/tags/v0.0.11 - Owner: https://github.com/clef-longeval
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@0a2bab6c2c5cf084052b2c44069381a0cf993ef9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ir_datasets_longeval-0.0.11-py3-none-any.whl.
File metadata
- Download URL: ir_datasets_longeval-0.0.11-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b06db42aed05cb8c9fe15eff31246b38af2b38bee70ddec5caf5eca923ece38d
|
|
| MD5 |
707283a1ea09f3c4b3f58d9c3b67769a
|
|
| BLAKE2b-256 |
e1ab4ca75b800788c78d666f5666e433088d0a9c5098367b57f284a2153e8912
|
Provenance
The following attestation bundles were made for ir_datasets_longeval-0.0.11-py3-none-any.whl:
Publisher:
ci.yml on clef-longeval/ir-datasets-longeval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ir_datasets_longeval-0.0.11-py3-none-any.whl -
Subject digest:
b06db42aed05cb8c9fe15eff31246b38af2b38bee70ddec5caf5eca923ece38d - Sigstore transparency entry: 776405141
- Sigstore integration time:
-
Permalink:
clef-longeval/ir-datasets-longeval@0a2bab6c2c5cf084052b2c44069381a0cf993ef9 -
Branch / Tag:
refs/tags/v0.0.11 - Owner: https://github.com/clef-longeval
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@0a2bab6c2c5cf084052b2c44069381a0cf993ef9 -
Trigger Event:
push
-
Statement type: