Skip to main content

Unofficial demo datasets for Weaviate

Project description

UNOFFICIAL Weaviate demo data uploader

This is an educational project that aims to make it easy to upload demo data to your instance of Weaviate. The target audience is developers learning how to use Weaviate.

Usage

pip install weaviate-demo-datasets

All datasets are based on the Dataset superclass, which includes a number of built-in methods to make it easier to work with it.

Each dataset includes a default vectorizer configuration for convenience, which can be:

  • viewed via the .get_class_definitions method and
  • changed via the .set_vectorizer method. The target Weaviate instance must include the specified vectorizer module.

Once you instantiate a dataset, you can upload it to Weaviate with the following:

import weaviate_datasets
dataset = weaviate_datasets.JeopardyQuestions10k()  # Instantiate dataset
dataset.upload_dataset(client)  # Add class to schema & upload objects (uses batch uploads by default)

Where client is the instantiated weaviate.Client object, such as:

import weaviate
import os
import json

wv_url = "https://some-endpoint.weaviate.network"
api_key = os.environ.get("OPENAI_API_KEY")

# If authentication required (e.g. using WCS)
auth = weaviate.AuthClientPassword(
    username=os.environ.get("WCS_USER"),
    password=os.environ.get("WCS_PASS"),
)

client = weaviate.Client(
    url=wv_url,
    auth_client_secret=auth,  # If authentication required
    additional_headers={"X-OpenAI-Api-Key": api_key},  # If using OpenAI inference
)

Built-in methods

  • .upload_dataset(client) - add defined classes to schema, adds objects

  • .get_class_definitions(): See the schema definition to be added

  • .get_class_names(): See class names in the dataset

  • .get_sample(): See a sample data object

  • .classes_in_schema(client): Check whether each class is already in the Weaviate schema

  • .delete_existing_dataset_classes(client): If dataset classes are already in the Weaviate instance, delete them from the Weaviate instance.

  • .set_vectorizer(vectorizer_name, module_config): Set the vectorizer and corresponding module configuration for the dataset. Datasets come pre-configured with a vectorizer & module configuration.

Available classes

Not including vectors

  • WikiArticles (A handful of Wikipedia summaries)
  • WineReviews (50 wine reviews)

Including vectors

  • JeopardyQuestions1k (1,000 Jeopardy questions & answers, vectorized with OpenAI text-embedding-ada-002)
  • JeopardyQuestions10k (10,000 Jeopardy questions & answers, vectorized with OpenAI text-embedding-ada-002)
  • NewsArticles (News articles, including their corresponding publications, authors & categories, vectorized with OpenAI text-embedding-ada-002)

Data sources

https://www.kaggle.com/datasets/zynicide/wine-reviews https://www.kaggle.com/datasets/tunguz/200000-jeopardy-questions https://github.com/weaviate/DEMO-NewsPublications

Source code

https://github.com/databyjp/wv_demo_uploader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weaviate-demo-datasets-0.0.17.tar.gz (71.2 MB view details)

Uploaded Source

Built Distribution

weaviate_demo_datasets-0.0.17-py3-none-any.whl (75.8 MB view details)

Uploaded Python 3

File details

Details for the file weaviate-demo-datasets-0.0.17.tar.gz.

File metadata

File hashes

Hashes for weaviate-demo-datasets-0.0.17.tar.gz
Algorithm Hash digest
SHA256 3d2251773f66d46163f568fa3dd35835b4b66d3b6084d8634f142dde5dbef8cf
MD5 18cca0270d5fe5bf004a94e1d181b642
BLAKE2b-256 c7793b847a94a6fac70fd8e25257ee38054b7cde3916f41f21bc064a8a7ce21b

See more details on using hashes here.

File details

Details for the file weaviate_demo_datasets-0.0.17-py3-none-any.whl.

File metadata

File hashes

Hashes for weaviate_demo_datasets-0.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 64c02840eb3fe8f2c52d552f65ff5a3b2311f29ed4fc65386b99cc22e0c96d25
MD5 ea1b051ad1eb03b26a4cb80047a0688a
BLAKE2b-256 b93d29c2abe4eceb2d5aceb8163b371df162203c9f59780dadde7a2b6e690bd6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page