Skip to main content

Sparse-Dense Embeddings for Pinecone in Haystack

Project description

haystack-hybrid-embedding

Recently, Pinecone announced support for Sparse-dense embeddings, allowing for hybrid vector search (both semantic and keyword search).

haystack is a fantastic NLP framework that does not yet support hybrid vectors for Retrievers.

This little library helps temporarily bridge the gap!

Installation

$ pip install haystack-hybrid-embedding

Usage

from haystack_hybrid_embedding import SpladeEmbeddingEncoder
from haystack_hybrid_embedding.pinecone import PineconeHybridDocumentStore, SparseDenseRetriever

document_store = PineconeHybridDocumentStore(...)

retriever = SparseDenseRetriever(
  sparse_encoder=SpladeEmbeddingEncoder(),
  alpha=0.8,
  ...
)

Replacing EmbeddingRetriever

Simply replace your imports of PineconeDocumentStore and EmbeddingRetriever/MultihopEmbeddingRetriever.

1,2c1,3
< from haystack.document_stores.pinecone import PineconeDocumentStore
< from haystack.nodes import EmbeddingRetriever, MultihopEmbeddingRetriever
---
> from haystack_hybrid_embedding import SpladeEmbeddingEncoder
> from haystack_hybrid_embedding.pinecone import PineconeHybridDocumentStore, SparseDenseRetriever, SparseDenseMultihopRetriever
>
4c5
< document_store = PineconeDocumentStore(...)
---
> document_store = PineconeHybridDocumentStore(...)
6c7
< retriever = EmbeddingRetriever(
---
> retriever = SparseDenseRetriever(
7a9,10
>     sparse_encoder=SpladeEmbeddingEncoder(),
>     alpha=0.8,

Config

There are only two additional parameters exposed on SparseDenseRetriever over EmbeddingRetriever:

  • sparse_encoder embeds both queries and documents into sparse vectors
  • alpha controls the weighting between the sparse and dense vectors (0 is all sparse, and 1 is all dense)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haystack_hybrid_embedding-0.1.0.tar.gz (3.5 kB view hashes)

Uploaded Source

Built Distribution

haystack_hybrid_embedding-0.1.0-py3-none-any.whl (5.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page