Pinecone Datasets lets you easily load datasets into your Pinecone index.
Reason this release was yanked:
new version
Project description
Pinecone Datasets
Usage
You can use Pinecone Datasets to load our public datasets or with your own dataset.
Loading Pinecone Public Datasets
from datasets import list_datasets, load_dataset
list_datasets()
# ["cc-news_msmarco-MiniLM-L6-cos-v5", ... ]
dataset = load_dataset("cc-news_msmarco-MiniLM-L6-cos-v5")
dataset.head()
# Prints
┌─────┬───────────────────────────┬─────────────────────────────────────┬───────────────────┬──────┐
│ id ┆ values ┆ sparse_values ┆ metadata ┆ blob │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ list[f32] ┆ struct[2] ┆ struct[3] ┆ │
╞═════╪═══════════════════════════╪═════════════════════════════════════╪═══════════════════╪══════╡
│ 0 ┆ [0.118014, -0.069717, ... ┆ {[470065541, 52922727, ... 22364... ┆ {2017,12,"other"} ┆ .... │
│ ┆ 0.0060... ┆ ┆ ┆ │
└─────┴───────────────────────────┴─────────────────────────────────────┴───────────────────┴──────┘
Iterating over a Dataset documents
# List Iterator, where every list of size N Dicts with ("id", "metadata", "values", "sparse_values")
dataset.iter_documents(batch_size=n)
upserting to Index
pip install pinecone-client
import pinecone
pinecone.init(api_key="API_KEY", environment="us-west1-gcp")
pinecone.create_index(name="my-index", dimension=384, pod_type='s1')
index = pinecone.Index("my-index")
# Or: Iterating over documents in batches
for batch in dataset.iter_documents(batch_size=100):
index.upsert(vectors=batch)
upserting to an index with GRPC
Simply use GRPCIndex and do:
index = pinecone.GRPCIndex("my-index")
# Iterating over documents in batches
for batch in dataset.iter_documents(batch_size=100):
index.upsert(vectors=batch)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for pinecone_datasets-0.1.5a0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9aa8d47bc4d3a47a5fb7fb926f9c5b9be2ee1d4ac5cf22c05779db890dd36cc4 |
|
MD5 | 3ae1ce50546a1427ff0e6afe8b9247e2 |
|
BLAKE2b-256 | b35ea90f5c8325c71ec24c9fff01291bb667cc1e928a4ca9530327b8ee90b53f |
Close
Hashes for pinecone_datasets-0.1.5a0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d57935e3cc3f9d311a6788fe066d4f0bd8a951c10c6f7dc68aa8d82bef5d66a |
|
MD5 | b8e44fbbf67f1d332395b19d0f7c37d4 |
|
BLAKE2b-256 | 6aaeca9037f90a84e016487ac4c23805ca4e9a6e526a3b26266993d71b330b26 |