Pinecone Datasets lets you easily load datasets into your Pinecone index.
Project description
Pinecone Datasets
Usage
You can use Pinecone Datasets to load our public datasets or with your own dataset.
Loading Pinecone Public Datasets
from pinecone_datasets import list_datasets, load_dataset
list_datasets()
# ["cc-news_msmarco-MiniLM-L6-cos-v5", ... ]
dataset = load_dataset("cc-news_msmarco-MiniLM-L6-cos-v5")
dataset.head()
# Prints
┌─────┬───────────────────────────┬─────────────────────────────────────┬───────────────────┬──────┐
│ id ┆ values ┆ sparse_values ┆ metadata ┆ blob │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ list[f32] ┆ struct[2] ┆ struct[3] ┆ │
╞═════╪═══════════════════════════╪═════════════════════════════════════╪═══════════════════╪══════╡
│ 0 ┆ [0.118014, -0.069717, ... ┆ {[470065541, 52922727, ... 22364... ┆ {2017,12,"other"} ┆ .... │
│ ┆ 0.0060... ┆ ┆ ┆ │
└─────┴───────────────────────────┴─────────────────────────────────────┴───────────────────┴──────┘
Iterating over a Dataset documents
# List Iterator, where every list of size N Dicts with ("id", "metadata", "values", "sparse_values")
dataset.iter_documents(batch_size=n)
upserting to Index
pip install pinecone-client
import pinecone
pinecone.init(api_key="API_KEY", environment="us-west1-gcp")
pinecone.create_index(name="my-index", dimension=384, pod_type='s1')
index = pinecone.Index("my-index")
# Or: Iterating over documents in batches
for batch in dataset.iter_documents(batch_size=100):
index.upsert(vectors=batch)
upserting to an index with GRPC
Simply use GRPCIndex and do:
index = pinecone.GRPCIndex("my-index")
# Iterating over documents in batches
for batch in dataset.iter_documents(batch_size=100):
index.upsert(vectors=batch)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for pinecone_datasets-0.2.3a0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71f1b8f020046f7c6a54409acaaec86d61a5d9b6652d8b172fc96f3c915c683b |
|
MD5 | 1336f9d29aca302c1c670cf74500d4ce |
|
BLAKE2b-256 | 1f86b065372a41553633b0c2f03a8ff85aba5d73658d9a40c748ae259ac6f846 |
Close
Hashes for pinecone_datasets-0.2.3a0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e12ff2392895c38e8fdbfd58299741e91fc938d54d3de40d4b0d6977862d62c4 |
|
MD5 | f1b99d9de3f263bbcf2e9f3e5e97e79b |
|
BLAKE2b-256 | e0963f6223b1cb79738fe292241cf0ccf24dacd623adcf8119ba5a44c0c9b499 |