Unofficial demo datasets for Weaviate
Project description
UNOFFICIAL Weaviate demo data uploader
This is an educational project that aims to make it easy to upload demo data to your instance of Weaviate. The target audience is developers learning how to use Weaviate.
Usage
pip install -U weaviate-demo-datasets
Each dataset includes a default vectorizer configuration for convenience. The target Weaviate instance must include the specified vectorizer module.
Once you instantiate a dataset, you can upload it to Weaviate with the following:
import weaviate_datasets as wd
dataset = wd.JeopardyQuestions1k() # Instantiate dataset
dataset.upload_dataset(client) # Pass the Weaviate client instance
Where client is the instantiated weaviate.WeaviateClient object, such as:
import weaviate
import os
client = weaviate.connect_to_local(
headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)
To use a weaviate.Client object, use 0.5.x or older version of this package.
Built-in methods
.upload_dataset(client)- add defined classes to schema, adds objects.get_sample()- yields sample data object(s)
Available classes
-
Wiki100 (Top 100 Wikipedia articles)
WikiChunkcollection- Various chunking options available:
- Default:
wiki_sections(sections of the Wikipedia article) wiki_section_chunked(sections of the Wikipedia article, chunked into 200 character chunks)wiki_heading_only(only the headings of the Wikipedia article sections)fixed(fixed length chunks of 200 characters)
- Default:
- Use it as follows:
d = wd.Wiki100() d.collection_name = "WikiChunk" d.set_chunking("wiki_section_chunked") upload_responses = d.upload_dataset(client, overwrite=True)
-
WineReviews (50 wine reviews)
WineReviewcollection
-
WineReviewsNV (50 wine reviews)
WineReviewNVcollection, with named vectors ("title", "review_body", and "title_country")- "title_country" -> Vector from concatenation of "title" + "country"
-
WineReviewsMT (50 wine reviews)
WineReviewMTcollection, tenantstenantAandtenantB
-
JeopardyQuestions1k (1,000 Jeopardy questions & answers, vectorized with OpenAI
text-embedding-ada-002)JeopardyQuestionandJeopardyCategorycollections
-
JeopardyQuestions10k (10,000 Jeopardy questions & answers, vectorized with OpenAI
text-embedding-ada-002)JeopardyQuestionandJeopardyCategorycollections
-
NewsArticles (News articles, including their corresponding publications, authors & categories, vectorized with OpenAI
text-embedding-ada-002)
Available classes - V3 collection
These are available with a V3 suffix, and are compatible with the Weaviate Python client v3.x.
Not including vectors
- WineReviews (50 wine reviews)
- WineReviewsMT (50 wine reviews, multi-tenancy enabled)
Including vectors
- JeopardyQuestions1k (1,000 Jeopardy questions & answers, vectorized with OpenAI
text-embedding-ada-002) - JeopardyQuestions10k (10,000 Jeopardy questions & answers, vectorized with OpenAI
text-embedding-ada-002) - JeopardyQuestions1kMT (1,000 Jeopardy questions & answers, multi-tenancy enabled, vectorized with OpenAI
text-embedding-ada-002) - NewsArticles (News articles, including their corresponding publications, authors & categories, vectorized with OpenAI
text-embedding-ada-002)
Data sources
https://www.kaggle.com/datasets/zynicide/wine-reviews https://www.kaggle.com/datasets/tunguz/200000-jeopardy-questions https://github.com/weaviate/DEMO-NewsPublications
Source code
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file weaviate_demo_datasets-0.8.0.tar.gz.
File metadata
- Download URL: weaviate_demo_datasets-0.8.0.tar.gz
- Upload date:
- Size: 69.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a1ae1f6108d3d87374c0d961b98ef40248bb4dd38bfa92d53096746e77e3dfe
|
|
| MD5 |
242b5047a33d945ccd72db03c7ff0ee5
|
|
| BLAKE2b-256 |
31870b11db08ea274685d7ccd38492ccb06d4524d947e7bb060e0dd6cd529cff
|
File details
Details for the file weaviate_demo_datasets-0.8.0-py3-none-any.whl.
File metadata
- Download URL: weaviate_demo_datasets-0.8.0-py3-none-any.whl
- Upload date:
- Size: 73.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a209b63b5f7854677555b966a0a1aa46ef8685c5d3e86a1cbb8d9f997da2607
|
|
| MD5 |
d9904e0a96aaa8e288f1a7c68aa64fa3
|
|
| BLAKE2b-256 |
6cc44774dc93ec8e3e0e6a0ba6895c573fddef33ae003f144da8305a383989bc
|