Skip to main content

Sparse Embeddings for Neural Search.

Project description

Neural-Cherche

Neural Search

documentation license

Neural-Cherche is a library designed to fine-tune neural search models such as Splade, ColBERT, and SparseEmbed on a specific dataset. Neural-Cherche also provide classes to run efficient inference on a fine-tuned retriever or ranker. Neural-Cherche aims to offer a straightforward and effective method for fine-tuning and utilizing neural search models in both offline and online settings. It also enables users to save all computed embeddings to prevent redundant computations.

Installation

We can install neural_cherche using:

pip install neural-cherche

If we plan to evaluate our model while training install:

pip install "neural-cherche[eval]"

Documentation

The complete documentation is available here.

Quick Start

Your training dataset must be made out of triples (anchor, positive, negative) where anchor is a query, positive is a document that is directly linked to the anchor and negative is a document that is not relevant for the anchor.

X = [
    ("anchor 1", "positive 1", "negative 1"),
    ("anchor 2", "positive 2", "negative 2"),
    ("anchor 3", "positive 3", "negative 3"),
]

And here is how to fine-tune ColBERT using neural-cherche:

import torch

from neural_cherche import models, utils, train

model = models.ColBERT(
    model_name_or_path="sentence-transformers/all-mpnet-base-v2",
    device="cuda" if torch.cuda.is_available() else "cpu"
)

optimizer = torch.optim.AdamW(model.parameters(), lr=3e-5)

X = [
    ("query", "positive document", "negative document"),
    ("query", "positive document", "negative document"),
    ("query", "positive document", "negative document"),
]

for anchor, positive, negative in utils.iter(
        X,
        epochs=1,
        batch_size=32,
        shuffle=True
    ):

    loss = train.train_colbert(
        model=model,
        optimizer=optimizer,
        anchor=anchor,
        positive=positive,
        negative=negative,
    )

model.save_pretrained("checkpoint")

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neural_cherche-1.0.0.tar.gz (23.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page