Skip to main content

Extension for accessing the ClueWeb22 via ir_datasets.

Project description

PyPi CI Code coverage Python Issues Commit activity Downloads License

💾 ir-datasets-clueweb22

Extension for accessing the ClueWeb22 via ir_datasets.

Installation

Install the package from PyPI:

pip install ir-datasets-clueweb22

Usage

Using this extension is simple. Just register the additional datasets by calling register(). Then you can load the datasets with ir_datasets as usual:

from ir_datasets import load
from ir_datasets_clueweb22 import register

# Register the ClueWeb22 datasets.
register()
# Use ir_datasets as usual.
dataset = load("clueweb22/b")

If you want to use the CLI, just use the ir_datasets_clueweb22 instead of ir_datasets. All CLI commands will work as usual, e.g., to list the available datasets:

ir_datasets_clueweb22 list

Development

To build this package and contribute to its development you need to install the build, setuptools, and wheel packages (pre-installed on most systems):

pip install build setuptools wheel

Create and activate a virtual environment:

python3.10 -m venv venv/
source venv/bin/activate

Dependencies

Install the package and test dependencies:

pip install -e .[tests]

Testing

Verify your changes against the test suite to verify.

ruff check .                   # Code format and LINT
mypy .                         # Static typing
bandit -c pyproject.toml -r .  # Security
pytest .                       # Unit tests

Please also add tests for your newly developed code.

Build wheels

Wheels for this package can be built with:

python -m build

Support

If you have any problems using this package, please file an issue. We're happy to help!

License

This repository is released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ir_datasets_clueweb22-0.1.0.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

ir_datasets_clueweb22-0.1.0-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file ir_datasets_clueweb22-0.1.0.tar.gz.

File metadata

  • Download URL: ir_datasets_clueweb22-0.1.0.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for ir_datasets_clueweb22-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b22f4caf022b8ec7b2da406bb8f59ff33112964e4da85db40c2d635c216d7677
MD5 0be9c9cc9a38152c97ed0ebbe5e5e04d
BLAKE2b-256 c0289385006e6226c7dd0b525059238f55e2a4db8148837ecd8523dcd786c30e

See more details on using hashes here.

File details

Details for the file ir_datasets_clueweb22-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ir_datasets_clueweb22-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c0163d85c1f997409e127b7a36808d07af5a94ce2112615147d172cda82c0857
MD5 06d14212b5dce0e625774d59462879b0
BLAKE2b-256 ba578b7b426131e38afe0c2cc48bb058b02bc4ba0c1b66121f7930df22a03012

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page