Skip to main content

Unofficial demo datasets for Weaviate

Project description

UNOFFICIAL Weaviate demo data uploader

This is an educational project that aims to make it easy to upload demo data to your instance of Weaviate. The target audience is developers learning how to use Weaviate.

Usage

pip install weaviate-demo-datasets

All datasets are based on the Dataset superclass, which includes a number of built-in methods to make it easier to work with it.

Each dataset includes a default vectorizer configuration for convenience, which can be:

  • viewed via the .get_class_definitions method and
  • changed via the .set_vectorizer method. The target Weaviate instance must include the specified vectorizer module.

Once you instantiate a dataset, you can upload it to Weaviate with the following:

import weaviate_datasets
dataset = weaviate_datasets.JeopardyQuestions10k()  # Instantiate dataset
dataset.upload_dataset(client)  # Add class to schema & upload objects (uses batch uploads by default)

Where client is the instantiated weaviate.Client object, such as:

import weaviate
import os
import json

wv_url = "https://some-endpoint.weaviate.network"
api_key = os.environ.get("OPENAI_API_KEY")

auth = weaviate.AuthClientPassword(
    username=os.environ.get("WCS_USER"),
    password=os.environ.get("WCS_PASS"),
)

client = weaviate.Client(
    url=wv_url,
    auth_client_secret=auth,
    additional_headers={"X-OpenAI-Api-Key": api_key},
)

Built-in methods

  • .add_to_schema(client) - add defined classes to schema; returns status & any classes already present

  • .upload_objects(client, batch_size) - adds objects; must specify batch size

  • .upload_dataset(client) - runs .add_to_schema and .upload_objects; default batch size 100

  • .get_class_definitions(): See the schema definition to be added

  • .get_class_names(): See class names in the dataset

  • .get_sample(): See a sample data object

  • .classes_in_schema(client): Check whether each class is already in the Weaviate schema

  • .delete_existing_dataset_classes(client): If dataset classes are already in the Weaviate instance, delete them from the Weaviate instance.

  • .set_vectorizer(vectorizer_name, module_config): Set the vectorizer and corresponding module configuration for the dataset. Datasets come pre-configured with a vectorizer & module configuration.

Available classes

Not including vectors

  • WikiArticles (A handful of Wikipedia summaries)
  • WineReviews (50 wine reviews)

Including vectors

  • WikiCities (500 large cities + Wikipedia summaries)
  • JeopardyQuestions1k (10,000 Jeopardy questions & answers)
  • JeopardyQuestions10k (10,000 Jeopardy questions & answers)

Source code

https://github.com/databyjp/wv_demo_uploader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weaviate-demo-datasets-0.0.12a1.tar.gz (71.2 MB view details)

Uploaded Source

Built Distribution

weaviate_demo_datasets-0.0.12a1-py3-none-any.whl (75.8 MB view details)

Uploaded Python 3

File details

Details for the file weaviate-demo-datasets-0.0.12a1.tar.gz.

File metadata

File hashes

Hashes for weaviate-demo-datasets-0.0.12a1.tar.gz
Algorithm Hash digest
SHA256 ba620a2b0fe4b7d2ba9b06758f3c0c33a8df64efedf3b727b2728cc131e76d4d
MD5 f840490d6e6236df6c0c6b1fe624bb56
BLAKE2b-256 526874fb76ded5df619ddb82f3a5fb82db3ab00e560c39453c76fdaedd734baa

See more details on using hashes here.

File details

Details for the file weaviate_demo_datasets-0.0.12a1-py3-none-any.whl.

File metadata

File hashes

Hashes for weaviate_demo_datasets-0.0.12a1-py3-none-any.whl
Algorithm Hash digest
SHA256 0e3da746b758534a3527412addb4aca69022a55a8c4dd20b31b0fdd3b91d2188
MD5 ed1b8727194b4c6e3168fa904a914948
BLAKE2b-256 44bfff41c5b5f3cb2154d1ac7fa820e2074aa0634b8bc01ec832c8e91ae370f9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page