Skip to main content

Unofficial demo datasets for Weaviate

Project description

UNOFFICIAL Weaviate demo data uploader

This is an educational project that aims to make it easy to upload demo data to your instance of Weaviate. The intended use case for users learning how to use Weaviate.

Usage

All datasets are based on Dataset superclass, and includes a number of built-in methods to make it easier to work with it.

Once you instantiate a dataset, to upload it to Weaviate the syntax is as follows:

import wv_datasets
dataset = wv_datasets.JeopardyQuestionsSmall()  # Instantiate dataset
dataset.upload_dataset(client)  # Add class to schema & Upload objects (uses batch uploads by default)

Where client is the instantiated weaviate.Client object.

import weaviate
import os
import json

wv_url = "https://some-endpoint.weaviate.network"
api_key = os.environ.get("OPENAI_API_KEY")

auth = weaviate.AuthClientPassword(
    username=os.environ.get("WCS_USER"),
    password=os.environ.get("WCS_PASS"),
)

client = weaviate.Client(
    url=wv_url,
    auth_client_secret=auth,
    additional_headers={"X-OpenAI-Api-Key": api_key},
)

Built-in methods

  • .add_to_schema(client) - add defined classes to schema; returns status & any classes already present

  • .upload_objects(client, batch_size) - adds objects; must specify batch size

  • .upload_dataset(client) - runs .add_to_schema and .upload_objects; default batch size 100

  • .get_class_definitions(): See the schema definition to be added

  • .get_class_names(): See class names in the dataset

  • .classes_in_schema(client): Check whether each class is already in the Weaviate schema

  • .delete_existing_dataset_classes(client): If dataset classes are already in the Weaviate instance, delete them from the Weaviate instance.

  • .set_vectorizer(vectorizer_name, module_config): Set the vectorizer and corresponding module configuration for the dataset. Datasets come pre-configured with a vectorizer & module configuration.

Available classes

  • WikiArticles
  • WineReviews
  • JeopardyQuestions1k
  • JeopardyQuestions10k

Source code:

https://github.com/databyjp/wv_demo_uploader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weaviate-demo-datasets-0.0.9.tar.gz (67.9 MB view details)

Uploaded Source

Built Distribution

weaviate_demo_datasets-0.0.9-py3-none-any.whl (72.2 MB view details)

Uploaded Python 3

File details

Details for the file weaviate-demo-datasets-0.0.9.tar.gz.

File metadata

File hashes

Hashes for weaviate-demo-datasets-0.0.9.tar.gz
Algorithm Hash digest
SHA256 5d51c22efdd22cae6d14577e0c796da8ba2faa275e096eff3cf72292d4c3b942
MD5 eeac4a8e897da4a9b16a52254ae92f20
BLAKE2b-256 6e1a79694eb3569227a723dd84846e8cb2887c44dfbae5f08be5f4adab740190

See more details on using hashes here.

File details

Details for the file weaviate_demo_datasets-0.0.9-py3-none-any.whl.

File metadata

File hashes

Hashes for weaviate_demo_datasets-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f32e52bf773de9b8262d14604b90a9e124869d76207959295209e4b4be47d7bf
MD5 dd84ca378327bf1722ce9003f7e787aa
BLAKE2b-256 97d35aa885e988918b4804996d82e2ed75b1b40b885cf5e4c7dc73c158eb5dd3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page