Skip to main content

Generate ideal question-answer dataset for testing your LLM.

Project description

FiddleCube - Generate ideal question-answers for testing RAG

FiddleCube generates an ideal question-answer dataset for testing your LLM. Run tests on this dataset before pushing any prompt or RAG upgrades.

Quickstart

Install FiddleCube

pip3 install fiddlecube

API Key

Get the API key here.

Usage

from fiddlecube import FiddleCube

fc = FiddleCube(api_key="<api-key>")
dataset = fc.generate(
    [
        "Wheat is mainly grown in the midlands and highlands of Ethiopia.",
        "Wheat covers most of the country's agricultural land next to teff, corn and sorghum and in the 2009/10 crop season 1.69 million hectares were covered by wheat crops",
        "46.42 million quintals of production was obtained and the average yield was 26.75 quintals per hectare.",
        "Bread wheat (Triticum aestivum L) and durum wheat (Triticum turgidum var durum L) are the types of wheat that are mainly produced in our country, and durum wheat is one of the native wheat crops.",
        "Ethiopia is known to be the primary source of durum wheat and a source of its biodiversity.",
        "Durum wheat is grown in high and medium altitude areas and clay and light soils, and its industrial demand is increasing from time to time.",
    ], # data chunks
    3, # number of rows to generate
)
dataset
{
    "results": [
        {
            "query": "Where is wheat primarily cultivated in Ethiopia?",
            "contexts": [
                "Wheat is mainly grown in the midlands and highlands of Ethiopia."
            ],
            "answer": "\"Wheat is primarily cultivated in the midlands and highlands of Ethiopia.\"",
            "score": 0.8,
            "question_type": "SIMPLE"
        },
        {
            "query": "If wheat, teff, corn and sorghum are the main crops, what was the coverage of wheat crops in the 2009/10 season?",
            "contexts": [
                "Wheat covers most of the country's agricultural land next to teff, corn and sorghum and in the 2009/10 crop season 1.69 million hectares were covered by wheat crops"
            ],
            "answer": "1.69 million hectares",
            "score": 0.8,
            "question_type": "CONDITIONAL"
        },
        {
            "query": "What was the total production obtained as mentioned in the context? A) 46.42 million quintals B) 26.75 million quintals C) 26.75 quintals D) 46.42 quintals per hectare",
            "contexts": [
                "46.42 million quintals of production was obtained and the average yield was 26.75 quintals per hectare."
            ],
            "answer": "Answer: A) 46.42 million quintals\n\nExplanation: The context information clearly states that \"46.42 million quintals of production was obtained,\" which directly corresponds to option A. The other options do not accurately reflect the total production mentioned in the context. Option B incorrectly combines the average yield figure with \"million quintals,\" option C provides only the average yield per hectare without the \"million\" scale, and option D incorrectly suggests that the production figure is a rate per hectare, rather than a total quantity.",
            "score": 0.8,
            "question_type": "MCQ"
        }
  ],
  "status": "COMPLETED",
  "num_tokens_generated": 44,
  "rate_limited": false
}

Ideal QnA datasets for testing, eval and training LLMs

Testing, evaluation or training LLMs requires an ideal QnA dataset aka the golden dataset.

This dataset needs to be diverse, covering a wide range of queries with accurate responses.

Creating such a dataset takes significant manual effort.

As the prompt or RAG contexts are updated, which is nearly all the time for early applications, the dataset needs to be updated to match.

FiddleCube generates ideal QnA from vector embeddings

  • The questions cover the entire RAG knowledge corpus.
  • Complex reasoning, safety alignment and 5 other question types are generated.
  • Filtered for correctness, context relevance and style.
  • Auto-updated with prompt and RAG updates.

Roadmap

  • Question-answers, complex reasoning from RAG
  • Multi-turn conversations
  • Evaluation Setup - Integrate metrics
  • CI setup - Run as part of CI/CD pipeline
  • Diagnose failures - step-by-step analysis of failed queries

More Questions?

Book a demo
Contact us at founders@fiddlecube.ai for any feature requests, feedback or questions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fiddlecube-0.1.8.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

fiddlecube-0.1.8-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file fiddlecube-0.1.8.tar.gz.

File metadata

  • Download URL: fiddlecube-0.1.8.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/22.3.0

File hashes

Hashes for fiddlecube-0.1.8.tar.gz
Algorithm Hash digest
SHA256 3b95c29dda802f419f99694aec52a3516befa4967967d4a467589ae7728ceccd
MD5 6543b8c807674c2a9dec65e072465b5c
BLAKE2b-256 d0c97bc2b529a4fcfbb011329e2007b230f7b23d2d65c4426b149bdb6492992b

See more details on using hashes here.

File details

Details for the file fiddlecube-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: fiddlecube-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/22.3.0

File hashes

Hashes for fiddlecube-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 e9e1668732d8dee8e99229679a4aebc279766849b88ce43f97cf04593a9a2975
MD5 3b5e84da814abc5f78f007318f0ecb9d
BLAKE2b-256 ad92f5d5d007e088524ecd627c6a5f620043164f606eb02b4b00901983f20938

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page