Skip to main content

Sparse-Dense Embeddings for Pinecone in Haystack

Project description

haystack-hybrid-embedding

Recently, Pinecone announced support for Sparse-dense embeddings, allowing for hybrid vector search (both semantic and keyword search).

haystack is a fantastic NLP framework that does not yet support hybrid vectors for Retrievers.

This little library helps temporarily bridge the gap!

Installation

$ pip install haystack-hybrid-embedding

Usage

from haystack_hybrid_embedding import SpladeEmbeddingEncoder
from haystack_hybrid_embedding.pinecone import PineconeHybridDocumentStore, SparseDenseRetriever

document_store = PineconeHybridDocumentStore(...)

retriever = SparseDenseRetriever(
  sparse_encoder=SpladeEmbeddingEncoder(),
  alpha=0.8,
  ...
)

Replacing EmbeddingRetriever

Simply replace your imports of PineconeDocumentStore and EmbeddingRetriever/MultihopEmbeddingRetriever.

1,2c1,3
< from haystack.document_stores.pinecone import PineconeDocumentStore
< from haystack.nodes import EmbeddingRetriever, MultihopEmbeddingRetriever
---
> from haystack_hybrid_embedding import SpladeEmbeddingEncoder
> from haystack_hybrid_embedding.pinecone import PineconeHybridDocumentStore, SparseDenseRetriever, SparseDenseMultihopRetriever
>
4c5
< document_store = PineconeDocumentStore(...)
---
> document_store = PineconeHybridDocumentStore(...)
6c7
< retriever = EmbeddingRetriever(
---
> retriever = SparseDenseRetriever(
7a9,10
>     sparse_encoder=SpladeEmbeddingEncoder(),
>     alpha=0.8,

Config

There are only two additional parameters exposed on SparseDenseRetriever over EmbeddingRetriever:

  • sparse_encoder embeds both queries and documents into sparse vectors
  • alpha controls the weighting between the sparse and dense vectors (0 is all sparse, and 1 is all dense)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haystack_hybrid_embedding-0.1.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file haystack_hybrid_embedding-0.1.0.tar.gz.

File metadata

  • Download URL: haystack_hybrid_embedding-0.1.0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.15 Darwin/22.1.0

File hashes

Hashes for haystack_hybrid_embedding-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ec856acbfaf83f99bd93bf3430b190a8af4c73ac5dbcde6b5428ad8ab9bc40f4
MD5 287d74d16975190bd78bbc6a2c453195
BLAKE2b-256 eba1d3d578594ddc882bc62b142a46855503a273c8bea9fc2b466301c1ad4a4e

See more details on using hashes here.

File details

Details for the file haystack_hybrid_embedding-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for haystack_hybrid_embedding-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a53edf9e185ac4150bbb33f30aeee99d6cbe82c1a5f9da25e86bca0d76cc316e
MD5 f2d730a9cb306a3f4eb379c52cc687fb
BLAKE2b-256 47c017df5360ffe16c846c37d3a925987266cb1a5a1d0dbac6a86d6f754c3fde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page