Text utilities library by Pinecone.io
Project description
Pinecone text client
Text utilities to work with Pinecone.
Sparse encoding
To convert your own text corpus to sparse vectors, you can either use BM25 or Splade. For more information, see the Pinecone documentation.
BM25
from pinecone_text.sparse import BM25
corpus = ["The quick brown fox jumps over the lazy dog",
"The lazy dog is brown",
"The fox is brown"]
# Initialize BM25 and fit the corpus
bm25 = BM25(tokenizer=lambda x: x.split())
bm25.fit(corpus)
# Encode a new document (for upsert to Pinecone index)
doc_sparse_vector = bm25.encode_document("The brown fox is quick")
# {"indices": [102, 18, 12, ...], "values": [0.21, 0.38, 0.15, ...]}
# Encode a query (for search in Pinecone index)
query_sparse_vector = bm25.encode_query("Which fox is brown?")
# {"indices": [102, 16, 18, ...], "values": [0.21, 0.11, 0.15, ...]}
# store BM25 params as json
bm25.store_params("bm25_params.json")
# load BM25 params from json
bm25.load_params("bm25_params.json")
Splade
from pinecone_text.sparse import Splade
corpus = ["The quick brown fox jumps over the lazy dog",
"The lazy dog is brown",
"The fox is brown"]
# Initialize Splade
splade = Splade()
# encode a batch of documents/queries
sparse_vectors = splade(corpus)
# [{"indices": [102, 18, 12, ...], "values": [0.21, 0.38, 0.15, ...]}, ...]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pinecone_text-0.1.1.tar.gz
(4.4 kB
view hashes)
Built Distribution
Close
Hashes for pinecone_text-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fc98c64a0cb6d5655b38383e212398b03f6f3737ac372ae77751bbe2e862ae1 |
|
MD5 | c58b8cb4c0b0d8cb61c8eeaad77f9333 |
|
BLAKE2b-256 | 7dff43449c30960fef2dcbf43515d3941c14898f9ae874422603b044947fab35 |