Skip to main content

Unofficial demo datasets for Weaviate

Project description

UNOFFICIAL Weaviate demo data uploader

This is an educational project that aims to make it easy to upload demo data to your instance of Weaviate. The target audience is developers learning how to use Weaviate.

Usage

pip install -U weaviate-demo-datasets

Each dataset includes a default vectorizer configuration for convenience. The target Weaviate instance must include the specified vectorizer module.

Once you instantiate a dataset, you can upload it to Weaviate with the following:

import weaviate_datasets as wd
dataset = wd.JeopardyQuestions1k()  # Instantiate dataset
dataset.upload_dataset(client)  # Pass the Weaviate client instance

Where client is the instantiated weaviate.WeaviateClient object, such as:

import weaviate
import os

client = weaviate.connect_to_local(
    headers={"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}
)

To use a weaviate.Client object, use 0.5.x or older version of this package.

Built-in methods

  • .upload_dataset(client) - add defined classes to schema, adds objects
  • .get_sample() - yields sample data object(s)

Available classes

  • Wiki100 (Top 100 Wikipedia articles)

    • WikiChunk collection
    • Various chunking options available:
      • Default: wiki_sections (sections of the Wikipedia article)
      • wiki_section_chunked (sections of the Wikipedia article, chunked into 200 character chunks)
      • wiki_heading_only (only the headings of the Wikipedia article sections)
      • fixed (fixed length chunks of 200 characters)
    • Use it as follows:
      d = wd.Wiki100()
      d.collection_name = "WikiChunk"
      d.set_chunking("wiki_section_chunked")
      upload_responses = d.upload_dataset(client, overwrite=True)
      
  • WineReviews (50 wine reviews)

    • WineReview collection
  • WineReviewsNV (50 wine reviews)

    • WineReviewNV collection, with named vectors ("title", "review_body", and "title_country")
      • "title_country" -> Vector from concatenation of "title" + "country"
  • WineReviewsMT (50 wine reviews)

    • WineReviewMT collection, tenants tenantA and tenantB
  • JeopardyQuestions1k (1,000 Jeopardy questions & answers, vectorized with OpenAI text-embedding-ada-002)

    • JeopardyQuestion and JeopardyCategory collections
  • JeopardyQuestions10k (10,000 Jeopardy questions & answers, vectorized with OpenAI text-embedding-ada-002)

    • JeopardyQuestion and JeopardyCategory collections
  • NewsArticles (News articles, including their corresponding publications, authors & categories, vectorized with OpenAI text-embedding-ada-002)

Available classes - V3 collection

These are available with a V3 suffix, and are compatible with the Weaviate Python client v3.x.

Not including vectors

  • WineReviews (50 wine reviews)
  • WineReviewsMT (50 wine reviews, multi-tenancy enabled)

Including vectors

  • JeopardyQuestions1k (1,000 Jeopardy questions & answers, vectorized with OpenAI text-embedding-ada-002)
  • JeopardyQuestions10k (10,000 Jeopardy questions & answers, vectorized with OpenAI text-embedding-ada-002)
  • JeopardyQuestions1kMT (1,000 Jeopardy questions & answers, multi-tenancy enabled, vectorized with OpenAI text-embedding-ada-002)
  • NewsArticles (News articles, including their corresponding publications, authors & categories, vectorized with OpenAI text-embedding-ada-002)

Data sources

https://www.kaggle.com/datasets/zynicide/wine-reviews https://www.kaggle.com/datasets/tunguz/200000-jeopardy-questions https://github.com/weaviate/DEMO-NewsPublications

Source code

https://github.com/databyjp/wv_demo_uploader

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weaviate_demo_datasets-0.8.1.tar.gz (69.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

weaviate_demo_datasets-0.8.1-py3-none-any.whl (73.9 MB view details)

Uploaded Python 3

File details

Details for the file weaviate_demo_datasets-0.8.1.tar.gz.

File metadata

  • Download URL: weaviate_demo_datasets-0.8.1.tar.gz
  • Upload date:
  • Size: 69.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for weaviate_demo_datasets-0.8.1.tar.gz
Algorithm Hash digest
SHA256 d2b23f138cf4c5c3f9142869be093a8ac645cd3c266a5106fb64b8d9b1f045a4
MD5 6d658ec5dc1098ff1461c36cf1ff2298
BLAKE2b-256 ae6b558755735d030542b28c5a0a8c051691409930e4975983b97d2e8d926602

See more details on using hashes here.

File details

Details for the file weaviate_demo_datasets-0.8.1-py3-none-any.whl.

File metadata

File hashes

Hashes for weaviate_demo_datasets-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1495bd908200f0dac88c6c2aa4a8bc8c5a0b1b4636b407232a79be7ffd9a1291
MD5 fb6d143f61fb1076d79b71f93985fb84
BLAKE2b-256 ffc9f78f61ea1745b16968b4ee343c7f00a8977d9577cf71e02ecdb8f518cd3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page