Skip to main content

Vespa python API

Project description

Vespa library for data analysis

Provide data analysis support for Vespa applications

Install

pip install pyvespa

Connect to a Vespa app

Connect to a running Vespa application

from vespa.application import Vespa

app = Vespa(url = "https://api.cord19.vespa.ai")

Define a Query model

Easily define matching and ranking criteria

from vespa.query import Query, Union, WeakAnd, ANN, RankProfile
from random import random

match_phase = Union(
    WeakAnd(hits = 10), 
    ANN(
        doc_vector="title_embedding", 
        query_vector="title_vector", 
        embedding_model=lambda x: [random() for x in range(768)],
        hits = 10,
        label="title"
    )
)

rank_profile = RankProfile(name="bm25", list_features=True)

query_model = Query(match_phase=match_phase, rank_profile=rank_profile)

Query the vespa app

Send queries via the query API. See the query page for more examples.

query_result = app.query(
    query="Is remdesivir an effective treatment for COVID-19?", 
    query_model=query_model
)
query_result.number_documents_retrieved

Labelled data

How to structure labelled data

labelled_data = [
    {
        "query_id": 0, 
        "query": "Intrauterine virus infections and congenital heart disease",
        "relevant_docs": [{"id": 0, "score": 1}, {"id": 3, "score": 1}]
    },
    {
        "query_id": 1, 
        "query": "Clinical and immunologic studies in identical twins discordant for systemic lupus erythematosus",
        "relevant_docs": [{"id": 1, "score": 1}, {"id": 5, "score": 1}]
    }
]

Non-relevant documents are assigned "score": 0 by default. Relevant documents will be assigned "score": 1 by default if the field is missing from the labelled data. The defaults for both relevant and non-relevant documents can be modified on the appropriate methods.

Collect training data

Collect training data to analyse and/or improve ranking functions. See the collect training data page for more examples.

training_data_batch = app.collect_training_data(
    labelled_data = labelled_data,
    id_field = "id",
    query_model = query_model,
    number_additional_docs = 2
)
training_data_batch

Evaluating a query model

Define metrics and evaluate query models. See the evaluation page for more examples.

We will define the following evaluation metrics:

  • % of documents retrieved per query
  • recall @ 10 per query
  • MRR @ 10 per query
from vespa.evaluation import MatchRatio, Recall, ReciprocalRank

eval_metrics = [MatchRatio(), Recall(at=10), ReciprocalRank(at=10)]

Evaluate:

evaluation = app.evaluate(
    labelled_data = labelled_data,
    eval_metrics = eval_metrics, 
    query_model = query_model, 
    id_field = "id",
)
evaluation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvespa-0.1.0.tar.gz (12.5 kB view hashes)

Uploaded Source

Built Distribution

pyvespa-0.1.0-py3-none-any.whl (13.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page