Skip to main content

Unofficial demo datasets for Weaviate

Project description

UNOFFICIAL Weaviate demo data uploader

This is an educational project that aims to make it easy to upload demo data to your instance of Weaviate. The target audience is developers learning how to use Weaviate.

Usage

pip install weaviate-demo-datasets

Each dataset includes a default vectorizer configuration for convenience. The target Weaviate instance must include the specified vectorizer module.

Once you instantiate a dataset, you can upload it to Weaviate with the following:

import weaviate_datasets as wd
dataset = wd.JeopardyQuestions1k()  # Instantiate dataset
dataset.upload_dataset(client)  # Pass the Weaviate client instance

Where client is the instantiated weaviate.WeaviateClient object, such as:

import weaviate
import os

client = weaviate.connect_to_local(
    headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

To use a weaviate.Client object, as used in the Weaviate Python client v3.x, import the dataset class from weaviate_datasets.v3.

import weaviate_datasets.v3_datasets as wd_v3
dataset = wd_v3.JeopardyQuestions1k()  # Instantiate dataset
dataset.upload_dataset(client)  # Pass the Weaviate client instance

Built-in methods

  • .upload_dataset(client) - add defined classes to schema, adds objects
  • .get_sample() - yields sample data object(s)

Available classes

  • Wiki100 (Top 100 Wikipedia articles) (WikiChunk collection)
  • WineReviews (50 wine reviews) (WineReview collection)
  • JeopardyQuestions1k (1,000 Jeopardy questions & answers, vectorized with OpenAI text-embedding-ada-002) (JeopardyQuestion and JeopardyCategory collections)
  • JeopardyQuestions10k (10,000 Jeopardy questions & answers, vectorized with OpenAI text-embedding-ada-002) (JeopardyQuestion and JeopardyCategory collections)

Available classes - V3 collection

Not including vectors

  • WikiArticles (A handful of Wikipedia summaries)
  • WineReviews (50 wine reviews)
  • WineReviewsMT (50 wine reviews, multi-tenancy enabled)

Including vectors

  • JeopardyQuestions1k (1,000 Jeopardy questions & answers, vectorized with OpenAI text-embedding-ada-002)
  • JeopardyQuestions1kMT (1,000 Jeopardy questions & answers, multi-tenancy enabled, vectorized with OpenAI text-embedding-ada-002)
  • JeopardyQuestions10k (10,000 Jeopardy questions & answers, vectorized with OpenAI text-embedding-ada-002)
  • NewsArticles (News articles, including their corresponding publications, authors & categories, vectorized with OpenAI text-embedding-ada-002)

Data sources

https://www.kaggle.com/datasets/zynicide/wine-reviews https://www.kaggle.com/datasets/tunguz/200000-jeopardy-questions https://github.com/weaviate/DEMO-NewsPublications

Source code

https://github.com/databyjp/wv_demo_uploader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weaviate-demo-datasets-0.3.0.tar.gz (71.2 MB view details)

Uploaded Source

Built Distribution

weaviate_demo_datasets-0.3.0-py3-none-any.whl (75.8 MB view details)

Uploaded Python 3

File details

Details for the file weaviate-demo-datasets-0.3.0.tar.gz.

File metadata

File hashes

Hashes for weaviate-demo-datasets-0.3.0.tar.gz
Algorithm Hash digest
SHA256 35d8d8b3f25f35294a786dac74d67893d8b628d182bc0d2f4e96b9aee79fd398
MD5 95fa617434d37d54ed6931369560c69f
BLAKE2b-256 c7a66247479282c715aa0260a5b201192e3a9fa75d0818304b794131e6f767e7

See more details on using hashes here.

File details

Details for the file weaviate_demo_datasets-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for weaviate_demo_datasets-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c5dd5b4cac832a0074580add80d3c3c86676914004258a71ffb549103b5a58a3
MD5 de36394d8d2b8b81f32f597b92fd8504
BLAKE2b-256 bc9cd82fd407a3b4d769bb135d635c0fc4ab3f2e7fce27c405e64f39d8a39ca3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page