Skip to main content

Text utilities library by Pinecone.io

Project description

Pinecone text client

Text utilities to work with Pinecone.

Sparse encoding

To convert your own text corpus to sparse vectors, you can either use BM25 or Splade. For more information, see the Pinecone documentation.

BM25

from pinecone_text.sparse import BM25

corpus = ["The quick brown fox jumps over the lazy dog",
          "The lazy dog is brown",
          "The fox is brown"]

# Initialize BM25 and fit the corpus
bm25 = BM25(tokenizer=lambda x: x.split())
bm25.fit(corpus)

# Encode a new document (for upsert to Pinecone index)
doc_sparse_vector = bm25.encode_document("The brown fox is quick") 
# {"indices": [102, 18, 12, ...], "values": [0.21, 0.38, 0.15, ...]}

# Encode a query (for search in Pinecone index)
query_sparse_vector = bm25.encode_query("Which fox is brown?")
# {"indices": [102, 16, 18, ...], "values": [0.21, 0.11, 0.15, ...]}

# store BM25 params as json
bm25.store_params("bm25_params.json")

# load BM25 params from json
bm25.load_params("bm25_params.json")

Splade

from pinecone_text.sparse import Splade

corpus = ["The quick brown fox jumps over the lazy dog",
          "The lazy dog is brown",
          "The fox is brown"]

# Initialize Splade
splade = Splade()

# encode a batch of documents/queries
sparse_vectors = splade(corpus)
# [{"indices": [102, 18, 12, ...], "values": [0.21, 0.38, 0.15, ...]}, ...]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pinecone_text-0.1.1.tar.gz (4.4 kB view hashes)

Uploaded Source

Built Distribution

pinecone_text-0.1.1-py3-none-any.whl (5.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page