Extension for accessing the ClueWeb22 via ir_datasets.
Project description
💾 ir-datasets-clueweb22
Extension for accessing the ClueWeb22 via ir_datasets.
Installation
Install the package from PyPI:
pip install ir-datasets-clueweb22
Usage
Using this extension is simple. Just register the additional datasets by calling register()
. Then you can load the datasets with ir_datasets as usual:
from ir_datasets import load
from ir_datasets_clueweb22 import register
# Register the ClueWeb22 datasets.
register()
# Use ir_datasets as usual.
dataset = load("clueweb22/b")
If you want to use the CLI, just use the ir_datasets_clueweb22
instead of ir_datasets
. All CLI commands will work as usual, e.g., to list the available datasets:
ir_datasets_clueweb22 list
Development
To build this package and contribute to its development you need to install the build
, setuptools
, and wheel
packages (pre-installed on most systems):
pip install build setuptools wheel
Create and activate a virtual environment:
python3.10 -m venv venv/
source venv/bin/activate
Dependencies
Install the package and test dependencies:
pip install -e .[tests]
Testing
Verify your changes against the test suite to verify.
ruff check . # Code format and LINT
mypy . # Static typing
bandit -c pyproject.toml -r . # Security
pytest . # Unit tests
Please also add tests for your newly developed code.
Build wheels
Wheels for this package can be built with:
python -m build
Support
If you have any problems using this package, please file an issue. We're happy to help!
License
This repository is released under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ir_datasets_clueweb22-0.1.0.tar.gz
.
File metadata
- Download URL: ir_datasets_clueweb22-0.1.0.tar.gz
- Upload date:
- Size: 28.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b22f4caf022b8ec7b2da406bb8f59ff33112964e4da85db40c2d635c216d7677 |
|
MD5 | 0be9c9cc9a38152c97ed0ebbe5e5e04d |
|
BLAKE2b-256 | c0289385006e6226c7dd0b525059238f55e2a4db8148837ecd8523dcd786c30e |
File details
Details for the file ir_datasets_clueweb22-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: ir_datasets_clueweb22-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0163d85c1f997409e127b7a36808d07af5a94ce2112615147d172cda82c0857 |
|
MD5 | 06d14212b5dce0e625774d59462879b0 |
|
BLAKE2b-256 | ba578b7b426131e38afe0c2cc48bb058b02bc4ba0c1b66121f7930df22a03012 |