The testing framework dedicated to ML models, from tabular to LLMs

These details have not been verified by PyPI

Project links

Project description

giskardlogo

The Evaluation & Testing framework for AI systems

Control risks of performance, bias and security issues in AI systems

Docs • Website • Community

Install Giskard 🐢

Install the latest version of Giskard from PyPi using pip:

pip install "giskard[llm]" -U

We officially support Python 3.9, 3.10 and 3.11.

Try in Colab 📙

Open Colab notebook

Giskard is an open-source Python library that automatically detects performance, bias & security issues in AI applications. The library covers LLM-based applications such as RAG agents, all the way to traditional ML models for tabular data.

Scan: Automatically assess your LLM-based agents for performance, bias & security issues ⤵️

Issues detected include:

Hallucinations
Harmful content generation
Prompt injection
Robustness issues
Sensitive information disclosure
Stereotypes & discrimination
many more...

Scan Example

RAG Evaluation Toolkit (RAGET): Automatically generate evaluation datasets & evaluate RAG application answers ⤵️

If you're testing a RAG application, you can get an even more in-depth assessment using RAGET, Giskard's RAG Evaluation Toolkit.

RAGET can generate automatically a list of question, reference_answer and reference_context from the knowledge base of the RAG. You can then use this generated test set to evaluate your RAG agent.
RAGET computes scores for each component of the RAG agent. The scores are computed by aggregating the correctness of the agent’s answers on different question types.
- Here is the list of components evaluated with RAGET:
  - Generator: the LLM used inside the RAG to generate the answers
  - Retriever: fetch relevant documents from the knowledge base according to a user query
  - Rewriter: rewrite the user query to make it more relevant to the knowledge base or to account for chat history
  - Router: filter the query of the user based on his intentions
  - Knowledge Base: the set of documents given to the RAG to generate the answers

Test Suite Example

Giskard works with any model, in any environment and integrates seamlessly with your favorite tools ⤵️

Looking for solutions to evaluate computer vision models? Check out giskard-vision, a library dedicated for computer vision tasks.

🤸‍♀️ Quickstart
- 1. 🏗️ Build a LLM agent
- 2. 🔎 Scan your model for issues
- 3. 🪄 Automatically generate an evaluation dataset for your RAG applications
👋 Community

🤸‍♀️ Quickstart

1. 🏗️ Build a LLM agent

Let's build an agent that answers questions about climate change, based on the 2023 Climate Change Synthesis Report by the IPCC.

Before starting let's install the required libraries:

pip install langchain langchain-community langchain-openai tiktoken "pypdf<=3.17.0"

from langchain import FAISS, PromptTemplate
from langchain_openai import OpenAIEmbeddings, OpenAI
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Prepare vector store (FAISS) with IPPC report
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, add_start_index=True)
loader = PyPDFLoader("https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf")
db = FAISS.from_documents(loader.load_and_split(text_splitter), OpenAIEmbeddings())

# Prepare QA chain
PROMPT_TEMPLATE = """You are the Climate Assistant, a helpful AI assistant made by Giskard.
Your task is to answer common questions on climate change.
You will be given a question and relevant excerpts from the IPCC Climate Change Synthesis Report (2023).
Please provide short and clear answers based on the provided context. Be polite and helpful.

Context:
{context}

Question:
{question}

Your answer:
"""

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
prompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["question", "context"])
climate_qa_chain = RetrievalQA.from_llm(llm=llm, retriever=db.as_retriever(), prompt=prompt)

2. 🔎 Scan your model for issues

Next, wrap your agent to prepare it for Giskard's scan:

import giskard
import pandas as pd

def model_predict(df: pd.DataFrame):
    """Wraps the LLM call in a simple Python function.

    The function takes a pandas.DataFrame containing the input variables needed
    by your model, and must return a list of the outputs (one for each row).
    """
    return [climate_qa_chain.run({"query": question}) for question in df["question"]]

# Don’t forget to fill the `name` and `description`: they are used by Giskard
# to generate domain-specific tests.
giskard_model = giskard.Model(
    model=model_predict,
    model_type="text_generation",
    name="Climate Change Question Answering",
    description="This model answers any question about climate change based on IPCC reports",
    feature_names=["question"],
)

✨✨✨Then run Giskard's magical scan✨✨✨

scan_results = giskard.scan(giskard_model)

Once the scan completes, you can display the results directly in your notebook:

display(scan_results)

# Or save it to a file
scan_results.to_html("scan_results.html")

If you're facing issues, check out our docs for more information.

3. 🪄 Automatically generate an evaluation dataset for your RAG applications

If the scan found issues in your model, you can automatically extract an evaluation dataset based on the issues found:

test_suite = scan_results.generate_test_suite("My first test suite")

By default, RAGET automatically generates 6 different question types (these can be selected if needed, see advanced question generation). The total number of questions is divided equally between each question type. To make the question generation more relevant and accurate, you can also provide a description of your agent.

from giskard.rag import generate_testset, KnowledgeBase

# Load your data and initialize the KnowledgeBase
df = pd.read_csv("path/to/your/knowledge_base.csv")

knowledge_base = KnowledgeBase.from_pandas(df, columns=["column_1", "column_2"])

# Generate a testset with 10 questions & answers for each question types (this will take a while)
testset = generate_testset(
    knowledge_base,
    num_questions=60,
    language='en',  # optional, we'll auto detect if not provided
    agent_description="A customer support chatbot for company X", # helps generating better questions
)

Depending on how many questions you generate, this can take a while. Once you’re done, you can save this generated test set for future use:

# Save the generated testset
testset.save("my_testset.jsonl")

You can easily load it back

from giskard.rag import QATestset

loaded_testset = QATestset.load("my_testset.jsonl")

# Convert it to a pandas dataframe
df = loaded_testset.to_pandas()

Here’s an example of a generated question:

question	reference_context	reference_answer	metadata
For which countries can I track my shipping?	Document 1: We offer free shipping on all orders over $50. For orders below $50, we charge a flat rate of $5.99. We offer shipping services to customers residing in all 50 states of the US, in addition to providing delivery options to Canada and Mexico. Document 2: Once your purchase has been successfully confirmed and shipped, you will receive a confirmation email containing your tracking number. You can simply click on the link provided in the email or visit our website’s order tracking page.	We ship to all 50 states in the US, as well as to Canada and Mexico. We offer tracking for all our shippings.	`{"question_type": "simple", "seed_document_id": 1, "topic": "Shipping policy"}`

Each row of the test set contains 5 columns:

question: the generated question
reference_context: the context that can be used to answer the question
reference_answer: the answer to the question (generated with GPT-4)
conversation_history: not shown in the table above, contain the history of the conversation with the agent as a list, only relevant for conversational question, otherwise it contains an empty list.
metadata: a dictionary with various metadata about the question, this includes the question_type, seed_document_id the id of the document used to generate the question and the topic of the question

👋 Community

We welcome contributions from the AI community! Read this guide to get started, and join our thriving community on Discord.

🌟 Leave us a star, it helps the project to get discovered by others and keeps us motivated to build awesome open-source tools! 🌟

❤️ If you find our work useful, please consider sponsoring us on GitHub. With a monthly sponsoring, you can get a sponsor badge, display your company in this readme, and get your bug reports prioritized. We also offer one-time sponsoring if you want us to get involved in a consulting project, run a workshop, or give a talk at your company.

💚 Current sponsors

We thank the following companies which are sponsoring our project with monthly donations:

Lunary

Biolevate

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.19.2

Jul 6, 2026

2.19.1

Feb 17, 2026

2.19.0

Feb 10, 2026

2.18.0

Aug 18, 2025

2.17.0

Jun 11, 2025

2.16.2

Mar 19, 2025

2.16.1

Feb 12, 2025

2.16.0

Nov 21, 2024

2.15.5

Nov 14, 2024

2.15.4

Nov 12, 2024

2.15.3

Oct 29, 2024

2.15.2

Oct 15, 2024

2.15.1

Sep 18, 2024

2.15.0

Sep 3, 2024

2.14.6

Aug 30, 2024

2.14.5

Aug 29, 2024

2.14.4

Aug 20, 2024

2.14.3

Jul 25, 2024

2.14.2

Jul 17, 2024

2.14.1

Jul 9, 2024

2.14.0

Jun 4, 2024

2.13.0

May 28, 2024

2.12.0

May 3, 2024

2.11.0

Apr 19, 2024

2.10.0

Apr 10, 2024

2.9.1

Apr 9, 2024

2.9.0

Apr 8, 2024

2.8.0

Mar 25, 2024

2.7.7

Mar 19, 2024

2.7.6

Mar 13, 2024

2.7.5

Mar 6, 2024

2.7.4

Feb 28, 2024

2.7.3

Feb 22, 2024

2.7.2

Feb 19, 2024

2.7.1

Feb 13, 2024

2.7.0

Jan 29, 2024

2.6.0

Jan 23, 2024

2.5.3

Jan 19, 2024

2.5.2

Jan 19, 2024

2.5.1

Jan 19, 2024

2.5.0

Jan 17, 2024

2.4.0

Jan 15, 2024

2.3.2

Jan 4, 2024

2.3.1

Jan 4, 2024

2.2.0

Jan 3, 2024

2.1.3

Jan 2, 2024

2.1.2

Dec 18, 2023

2.1.1

Dec 12, 2023

2.0.7

Dec 4, 2023

2.0.6 yanked

Dec 3, 2023

Reason this release was yanked:

hub db migration is broken

2.0.5

Nov 24, 2023

2.0.4

Nov 23, 2023

2.0.3

Nov 15, 2023

2.0.2

Nov 7, 2023

2.0.1

Nov 6, 2023

2.0.0

Nov 6, 2023

2.0.0b34 pre-release

Nov 6, 2023

2.0.0b33 pre-release

Nov 5, 2023

2.0.0b32 pre-release

Nov 4, 2023

2.0.0b31 pre-release

Nov 4, 2023

2.0.0b30 pre-release

Oct 31, 2023

2.0.0b29 pre-release

Oct 30, 2023

2.0.0b28 pre-release

Oct 25, 2023

2.0.0b27 pre-release

Oct 25, 2023

2.0.0b26 pre-release

Oct 12, 2023

2.0.0b25 pre-release

Oct 12, 2023

2.0.0b22 pre-release

Sep 25, 2023

2.0.0b20 pre-release

Sep 8, 2023

2.0.0b19 pre-release

Sep 7, 2023

2.0.0b18 pre-release

Sep 7, 2023

2.0.0b17 pre-release

Sep 5, 2023

2.0.0b16 pre-release

Sep 4, 2023

2.0.0b14 pre-release

Aug 9, 2023

2.0.0b13 pre-release

Aug 8, 2023

2.0.0b12 pre-release

Jul 24, 2023

2.0.0b11 pre-release

Jul 4, 2023

2.0.0b10 pre-release

Jun 21, 2023

2.0.0b9 pre-release

Jun 21, 2023

2.0.0b8 pre-release

Jun 14, 2023

2.0.0b7 pre-release

Jun 13, 2023

2.0.0b6 pre-release

Jun 12, 2023

2.0.0b5 pre-release

Jun 12, 2023

2.0.0b4 pre-release

Jun 7, 2023

2.0.0b3 pre-release

Jun 7, 2023

2.0.0b2 pre-release

Jun 6, 2023

2.0.0b1 pre-release

Jun 5, 2023

1.9.4

Jun 30, 2023

1.9.3

Jun 6, 2023

1.9.1

Mar 17, 2023

1.9.0

Mar 16, 2023

1.8.0

Feb 13, 2023

1.7.3

Jan 30, 2023

1.7.2

Jan 23, 2023

1.7.1

Jan 20, 2023

1.7.0

Nov 22, 2022

1.7.0a7 pre-release

Nov 2, 2022

1.7.0a6 pre-release

Nov 1, 2022

1.7.0a5 pre-release

Oct 24, 2022

1.7.0a4 pre-release

Oct 24, 2022

1.7.0a3 pre-release

Oct 10, 2022

1.7.0a2 pre-release

Oct 4, 2022

1.7.0a1 pre-release

Oct 4, 2022

1.6.0

Sep 2, 2022

1.5.0

Aug 30, 2022

1.4.0

Aug 24, 2022

1.3.1

Aug 9, 2022

1.3.0

Jul 25, 2022

1.2.0

Jul 8, 2022

1.1.0

Jun 10, 2022

1.0.0

Jun 9, 2022

1.0.0a2 pre-release yanked

Jun 7, 2022

1.0.0a1 pre-release yanked

Jun 5, 2022

0.1.2

Apr 11, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

giskard-2.19.2.tar.gz (473.9 kB view details)

Uploaded Jul 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

giskard-2.19.2-py3-none-any.whl (557.2 kB view details)

Uploaded Jul 6, 2026 Python 3

File details

Details for the file giskard-2.19.2.tar.gz.

File metadata

Download URL: giskard-2.19.2.tar.gz
Upload date: Jul 6, 2026
Size: 473.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for giskard-2.19.2.tar.gz
Algorithm	Hash digest
SHA256	`fc9cebbcfee35fd972f8f6ddffde56fe039608ebd0f291e902bded09b982fbf2`
MD5	`dc754bf84eec28caf4dd28636bcb8c29`
BLAKE2b-256	`cb24f068eeb2de39a7cae4e3361400dc21bb66e6ea39782834bec6ddee0e9f80`

See more details on using hashes here.

File details

Details for the file giskard-2.19.2-py3-none-any.whl.

File metadata

Download URL: giskard-2.19.2-py3-none-any.whl
Upload date: Jul 6, 2026
Size: 557.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for giskard-2.19.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`344c60475cd338fc4c3391d07e27f2a3a0db6de4ff7b7696e0c6876c1e79ad9d`
MD5	`0bfe4882a176a7121296b31b786f573c`
BLAKE2b-256	`ca5d958df159ae1891343647f533e5152f22be4b431e984518754ee56d661cb4`

See more details on using hashes here.

giskard 2.19.2

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

The Evaluation & Testing framework for AI systems

Control risks of performance, bias and security issues in AI systems

Docs • Website • Community

Install Giskard 🐢

Try in Colab 📙

Scan: Automatically assess your LLM-based agents for performance, bias & security issues ⤵️

RAG Evaluation Toolkit (RAGET): Automatically generate evaluation datasets & evaluate RAG application answers ⤵️

Contents

🤸‍♀️ Quickstart

1. 🏗️ Build a LLM agent

2. 🔎 Scan your model for issues

3. 🪄 Automatically generate an evaluation dataset for your RAG applications

👋 Community

💚 Current sponsors

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes