Unofficial demo datasets for Weaviate
Project description
UNOFFICIAL Weaviate demo data uploader
This is an educational project that aims to make it easy to upload demo data to your instance of Weaviate. The target audience is developers learning how to use Weaviate.
Usage
pip install weaviate-demo-datasets
All datasets are based on the Dataset
superclass, which includes a number of built-in methods to make it easier to work with it.
Each dataset includes a default vectorizer configuration for convenience, which can be:
- viewed via the
.get_class_definitions
method and - changed via the
.set_vectorizer
method. The target Weaviate instance must include the specified vectorizer module.
Once you instantiate a dataset, you can upload it to Weaviate with the following:
import weaviate_datasets
dataset = weaviate_datasets.JeopardyQuestions10k() # Instantiate dataset
dataset.upload_dataset(client) # Add class to schema & upload objects (uses batch uploads by default)
Where client
is the instantiated weaviate.Client
object, such as:
import weaviate
import os
import json
wv_url = "https://some-endpoint.weaviate.network"
api_key = os.environ.get("OPENAI_API_KEY")
# If authentication required (e.g. using WCS)
auth = weaviate.AuthClientPassword(
username=os.environ.get("WCS_USER"),
password=os.environ.get("WCS_PASS"),
)
client = weaviate.Client(
url=wv_url,
auth_client_secret=auth, # If authentication required
additional_headers={"X-OpenAI-Api-Key": api_key}, # If using OpenAI inference
)
Built-in methods
-
.upload_dataset(client)
- add defined classes to schema, adds objects -
.get_class_definitions()
: See the schema definition to be added -
.get_class_names()
: See class names in the dataset -
.get_sample()
: See a sample data object -
.classes_in_schema(client)
: Check whether each class is already in the Weaviate schema -
.delete_existing_dataset_classes(client)
: If dataset classes are already in the Weaviate instance, delete them from the Weaviate instance. -
.set_vectorizer(vectorizer_name, module_config)
: Set the vectorizer and corresponding module configuration for the dataset. Datasets come pre-configured with a vectorizer & module configuration.
Available classes
Not including vectors
- WikiArticles (A handful of Wikipedia summaries)
- WineReviews (50 wine reviews)
Including vectors
- JeopardyQuestions1k (1,000 Jeopardy questions & answers, vectorized with OpenAI
text-embedding-ada-002
) - JeopardyQuestions10k (10,000 Jeopardy questions & answers, vectorized with OpenAI
text-embedding-ada-002
) - NewsArticles (News articles, including their corresponding publications, authors & categories, vectorized with OpenAI
text-embedding-ada-002
)
Data sources
https://www.kaggle.com/datasets/zynicide/wine-reviews https://www.kaggle.com/datasets/tunguz/200000-jeopardy-questions https://github.com/weaviate/DEMO-NewsPublications
Source code
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file weaviate-demo-datasets-0.1.0.tar.gz
.
File metadata
- Download URL: weaviate-demo-datasets-0.1.0.tar.gz
- Upload date:
- Size: 71.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65e3b56ed71af66a8273c95b3bcdc7a295ed74a1bca478a4c381091646dff2d6 |
|
MD5 | a6bacabdb60bec200567aa3133ec227b |
|
BLAKE2b-256 | d7ce7187edf00d6c8ba2a6b69b029d6f80359d8c20fe424e7e668b01297613e1 |
File details
Details for the file weaviate_demo_datasets-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: weaviate_demo_datasets-0.1.0-py3-none-any.whl
- Upload date:
- Size: 75.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ab1ead9a05fe919e0b461c94ffc117d65f29390a8ea9ec11a6de06d8bf27589 |
|
MD5 | 7954617199bafbbd81a1e7f5bd6ed86e |
|
BLAKE2b-256 | 16e04c9ef8ee8566923a1ed25db82920249e038b80b5d63913e30cefb746a54a |