Skip to main content

Extension for accessing the LongEval test collections via ir_datasets.

Project description

PyPi CI Code coverage Python Issues Commit activity Downloads License

💾 ir-datasets-longeval

Extension for accessing the LongEval datasets via ir_datasets.

🚧 Under Construction

This project is currently under development. pip install is not yet available. To install the latest version, please use:

pip install git+https://github.com/jueri/ir-datasets-longeval.git

Installation

Install the package from PyPI:

pip install ir-datasets-longeval

Usage

The ir_datasets_longeval extension provides an load method that returns a LongEval ir_dataset that allows to load official versions of the LongEval datasets as well as modified versions that you have on your local filesystem:

from ir_datasets_longeval import load

# load an official version of the LongEval dataset.
dataset = load("longeval-web/2022-06")

# load a local copy of a LongEval dataset.
# E.g., so that you can easily run your approach on modified data.
dataset = load("<PATH-TO-A-DIRECTORY-ON-YOUR-MACHINE>")

# From now on, you can use dataset as any ir_dataset

LongEval datasets have a set of temporal specifics that you can use:

# At what time does/did a dataset take place?
dataset.get_timestamp()

# Each dataset can have a list of zero or more past datasets/interactions.
# You can incorporate them in your retrieval system:
for past_dataset in dataset.get_past_datasets():
    # `past_dataset` is an LongEval `ir_dataset` with the same functionality as the `dataset`
    past_dataset.get_timestamp()

If you want to use the CLI, just use the ir_datasets_longeval instead of ir_datasets. All CLI commands will work as usual, e.g., to list the officially available datasets:

ir_datasets_longeval list

Development

To build this package and contribute to its development you need to install the build, setuptools, and wheel packages (pre-installed on most systems):

pip install build setuptools wheel

Create and activate a virtual environment:

python3.10 -m venv venv/
source venv/bin/activate

Dependencies

Install the package and test dependencies:

pip install -e .[tests]

Testing

Verify your changes against the test suite to verify.

ruff check .                   # Code format and LINT
mypy .                         # Static typing
bandit -c pyproject.toml -r .  # Security
pytest .                       # Unit tests

Please also add tests for your newly developed code.

Build wheels

Wheels for this package can be built with:

python -m build

Support

If you have any problems using this package, please file an issue. We're happy to help!

Fork Notice

This repository is a fork of ir-datasets-clueweb22, originally developed by Jan Heinrich Merker. All credit for the original work goes to him, and this fork retains the original MIT License. The changes made in this fork include an adaptation from the clueweb22 dataset to the LongEval datasets.

License

This repository is released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ir_datasets_longeval-0.0.4.tar.gz (90.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ir_datasets_longeval-0.0.4-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file ir_datasets_longeval-0.0.4.tar.gz.

File metadata

  • Download URL: ir_datasets_longeval-0.0.4.tar.gz
  • Upload date:
  • Size: 90.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.3

File hashes

Hashes for ir_datasets_longeval-0.0.4.tar.gz
Algorithm Hash digest
SHA256 c0cfd6a3f449a186b942fef9759e21844d8a8eee1e14cb1004565d8ae142fa36
MD5 67ea96385ebc0d546a726ebb072e709f
BLAKE2b-256 45b7845e6e9e60c169f46816cf565c8ba9814a994a9b67d99a49f6d31bd697a9

See more details on using hashes here.

File details

Details for the file ir_datasets_longeval-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for ir_datasets_longeval-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8627f1b16a79045cd45bae6ef9f09d60c1cf767974fe1fcc1760106ad18ab7b4
MD5 b571d3284081b818ae57f93eb1e734b0
BLAKE2b-256 f02994b34bbb6cfe98f8e10d1479e9c133e55166c381616ab5ab0b299152b879

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page