Your one-stop shop for fine-tuning and running neural ranking models.

These details have not been verified by PyPI

Project description

Lightning IR

lightning ir logo

Your one-stop shop for fine-tuning and running neural ranking models.

Lightning IR is a library for fine-tuning and running neural ranking models. It is built on top of PyTorch Lightning to provide a simple and flexible interface to interact with neural ranking models.

Want to:

fine-tune your own cross- or bi-encoder models?
index and search through a collection of documents with ColBERT or SPLADE?
re-rank documents with state-of-the-art models?

Lightning IR has you covered!

Installation

Lightning IR can be installed using pip:

pip install lightning-ir

Getting Started

See the Quickstart guide for an introduction to Lightning IR. The Documentation provides a detailed overview of the library's functionality.

The easiest way to use Lightning IR is via the CLI. It uses the PyTorch Lightning CLI and adds additional options to provide a unified interface for fine-tuning and running neural ranking models.

The behavior of the CLI can be customized using yaml configuration files. See the configs directory for several example configuration files. For example, the following command can be used to re-rank the official TREC DL 19/20 re-ranking set with a pre-finetuned cross-encoder model. It will automatically download the model and data, run the re-ranking, write the results to a TREC-style run file, and report the nDCG@10 score.

lightning-ir re_rank \
  --config ./configs/trainer/inference.yaml \
  --config ./configs/callbacks/rank.yaml \
  --config ./configs/data/re-rank-trec-dl.yaml \
  --config ./configs/models/monoelectra.yaml

For more details, see the Usage section.

Usage

Command Line Interface

The CLI offers four subcommands:

$ lightning-ir -h
Lightning Trainer command line tool

subcommands:
  For more details of each subcommand, add it as an argument followed by --help.

  Available subcommands:
    fit                 Runs the full optimization routine.
    index               Index a collection of documents.
    search              Search for relevant documents.
    re_rank             Re-rank a set of retrieved documents.

Configurations files need to be provided to specify model, data, and fine-tuning/inference parameters. See the configs directory for examples. Four types of configurations exists:

trainer: Specifies the fine-tuning/inference parameters and callbacks.
model: Specifies the model to use and its parameters.
data: Specifies the dataset(s) to use and its parameters.
optimizer: Specifies the optimizer parameters (only needed for fine-tuning).

Example

The following example demonstrates how to fine-tune a BERT-based single-vector bi-encoder model using the official MS MARCO triples. The fine-tuned model is then used to index the MS MARCO passage collection and search for relevant passages. Finally, we show how to re-rank the retrieved passages.

Fine-tuning

To fine-tune a bi-encoder model on the MS MARCO triples dataset, use the following configuration file and command:

bi-encoder-fit.yaml

trainer:
  callbacks:
  - class_path: ModelCheckpoint
  max_epochs: 1
  max_steps: 100000
data:
  class_path: LightningIRDataModule
  init_args:
    train_batch_size: 32
    train_dataset:
      class_path: TupleDataset
      init_args:
        tuples_dataset: msmarco-passage/train/triples-small
model:
  class_path: BiEncoderModule
  init_args:
    model_name_or_path: bert-base-uncased
    config:
      class_path: BiEncoderConfig
    loss_functions:
    - class_path: RankNet
optimizer:
  class_path: AdamW
  init_args:
    lr: 1e-5

lightning-ir fit --config bi-encoder-fit.yaml

The fine-tuned model is saved in the directory lightning_logs/version_X/huggingface_checkpoint/.

Indexing

We now assume the model from the previous fine-tuning step was moved to the directory models/bi-encoder. To index the MS MARCO passage collection with faiss using the fine-tuned model, use the following configuration file and command:

bi-encoder-index.yaml

trainer:
  callbacks:
  - class_path: IndexCallback
    init_args:
        index_config:
          class_path: FaissFlatIndexConfig
model:
  class_path: BiEncoderModule
  init_args:
    model_name_or_path: models/bi-encoder
data:
  class_path: LightningIRDataModule
  init_args:
    num_workers: 1
    inference_batch_size: 256
    inference_datasets:
    - class_path: DocDataset
      init_args:
        doc_dataset: msmarco-passage

lightning-ir index --config bi-encoder-index.yaml

The index is saved in the directory models/bi-encoder/indexes/msmarco-passage.

Searching

To search for relevant documents in the MS MARCO passage collection using the bi-encoder and index, use the following configuration file and command:

bi-encoder-search.yaml

trainer:
  callbacks:
  - class_path: RankCallback
model:
  class_path: BiEncoderModule
  init_args:
    model_name_or_path: models/bi-encoder
    index_dir: models/bi-encoder/indexes/msmarco-passage
    search_config:
      class_path: FaissFlatSearchConfig
      init_args:
        k: 100
    evaluation_metrics:
    - nDCG@10
data:
  class_path: LightningIRDataModule
  init_args:
    num_workers: 1
    inference_batch_size: 4
    inference_datasets:
    - class_path: QueryDataset
      init_args:
        query_dataset: msmarco-passage/trec-dl-2019/judged
    - class_path: QueryDataset
      init_args:
        query_dataset: msmarco-passage/trec-dl-2020/judged

lightning-ir search --config bi-encoder-search.yaml

The run files are saved as models/bi-encoder/runs/msmarco-passage-trec-dl-20XX.run. Additionally, the nDCG@10 scores are printed to the console.

Re-ranking

Assuming we've also fine-tuned a cross-encoder that is saved in the directory models/cross-encoder, we can re-rank the retrieved documents using the following configuration file and command:

cross-encoder-re-rank.yaml

trainer:
  callbacks:
  - class_path: RankCallback
model:
  class_path: CrossEncoderModule
  init_args:
    model_name_or_path: models/cross-encoder
    evaluation_metrics:
    - nDCG@10
data:
  class_path: LightningIRDataModule
  init_args:
    num_workers: 1
    inference_batch_size: 4
    inference_datasets:
    - class_path: RunDataset
      init_args:
        run_path_or_id: models/bi-encoder/runs/msmarco-passage-trec-dl-2019.run
        depth: 100
        sample_size: 100
        sampling_strategy: top
    - class_path: RunDataset
      init_args:
        run_path_or_id: models/bi-encoder/runs/msmarco-passage-trec-dl-2020.run
        depth: 100
        sample_size: 100
        sampling_strategy: top

lightning-ir re_rank --config cross-encoder-re-rank.yaml

The run files are saved as models/cross-encoder/runs/msmarco-passage-trec-dl-20XX.run. Additionally, the nDCG@10 scores are printed to the console.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.6

Nov 12, 2025

0.0.5

Aug 25, 2025

This version

0.0.4

May 7, 2025

0.0.3

Apr 4, 2025

0.0.2

Nov 8, 2024

0.0.1

Sep 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning_ir-0.0.4.tar.gz (89.9 kB view details)

Uploaded May 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lightning_ir-0.0.4-py3-none-any.whl (113.1 kB view details)

Uploaded May 7, 2025 Python 3

File details

Details for the file lightning_ir-0.0.4.tar.gz.

File metadata

Download URL: lightning_ir-0.0.4.tar.gz
Upload date: May 7, 2025
Size: 89.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lightning_ir-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`9edd001779c682ae79463119932b4a332c7a79c41e7cb6c6616673e3f70d5072`
MD5	`be6d3543e9ea20143068e27d503c602f`
BLAKE2b-256	`40e2dd4dc97872845a1fb7f359b169a0ca1478fcad6998e7fb5367e4538688fb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightning_ir-0.0.4.tar.gz:

Publisher: python-publish.yml on webis-de/lightning-ir

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lightning_ir-0.0.4.tar.gz
- Subject digest: 9edd001779c682ae79463119932b4a332c7a79c41e7cb6c6616673e3f70d5072
- Sigstore transparency entry: 207906295
- Sigstore integration time: May 7, 2025
Source repository:
- Permalink: webis-de/lightning-ir@c444b6f30d1ab4f037b423fb9db82e7f643e5e18
- Branch / Tag: refs/tags/v0.0.4
- Owner: https://github.com/webis-de
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@c444b6f30d1ab4f037b423fb9db82e7f643e5e18
- Trigger Event: release

File details

Details for the file lightning_ir-0.0.4-py3-none-any.whl.

File metadata

Download URL: lightning_ir-0.0.4-py3-none-any.whl
Upload date: May 7, 2025
Size: 113.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for lightning_ir-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`59535337056509ee5863871f48d3d212500d28de2b313b4febe7929d0df33088`
MD5	`1d1d954f5492091a648c7962af7a03a6`
BLAKE2b-256	`0f5a6cad7a0e555b5e1a2597270153ebe5db08023fb709d73a231bf134f83c99`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightning_ir-0.0.4-py3-none-any.whl:

Publisher: python-publish.yml on webis-de/lightning-ir

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lightning_ir-0.0.4-py3-none-any.whl
- Subject digest: 59535337056509ee5863871f48d3d212500d28de2b313b4febe7929d0df33088
- Sigstore transparency entry: 207906296
- Sigstore integration time: May 7, 2025
Source repository:
- Permalink: webis-de/lightning-ir@c444b6f30d1ab4f037b423fb9db82e7f643e5e18
- Branch / Tag: refs/tags/v0.0.4
- Owner: https://github.com/webis-de
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@c444b6f30d1ab4f037b423fb9db82e7f643e5e18
- Trigger Event: release

lightning-ir 0.0.4

Navigation

Verified details

Owner

Unverified details

Meta

Classifiers

Project description

Lightning IR

Installation

Getting Started

Usage

Command Line Interface

Example

Fine-tuning

Indexing

Searching

Re-ranking

Project details

Verified details

Owner

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance