SDK to write and run tests for your LLM app

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Magik is an LLM output testing SDK + observability platform that helps you write tests and monitor your app in production.

Overview

Reliability of output is one of the biggest challenges for people trying to use LLM apps in production.

LLM responses are non-deterministic by nature. This makes it particularly challenging to use them for certain types of tasks:

If you're building a AI assistant that helps answer legal questions, and you cannot afford to have hallucinations, or misinformation.
If you're building a code generation AI, you might need to make sure the code is correct, and works as expected.
If you're building a customer support agent, you might need to make it sure it responds with accurate answers in a specified format, and does not contain sensitive information like PII.

We are trying to solve these problems with a test-driven approach towards LLM observability.

Use Cases

Who is this product meant for?

If you're in the early stages of building an LLM app:
If you have an LLM app in production

If you're in the early stages of building an LLM app:

Test-driven development can speed up your development very nicely, and can help you engineer your prompts to be more robust.

For example, assuming your prompt looks like this:

You are an AI customer support chatbot. You are trying to help a customer named {name} who needs some information.

Answer his questions in a polite tone. Be as respectful as possible. Do not mention that you are an AI. Do not refer to the customer by any name other than {name}. Do not use his email address or customer ID number.

You can write tests like this:

tests = [
  # Test that output does not contain restricted keywords
  {
      "description": "output does not contain restricted keywords",
      "eval_function": contains_none,
      "eval_function_args": [restricted_keywords],
      "prompt_vars": { name: "Sara" },
      "failure_labels": ["contains_restricted_words"],
  },
  # Test that output does not contain an email
  {
      "description": "output does not contain email",
      "eval_function": regex,
      "eval_function_args": [email_matcher],
      "prompt_vars": { name: "John" },
      "failure_labels": ["contains_email", "pii_leak", "critical"],
  }
]

If you have an LLM app in production:

You can use our evaluation & monitoring platform to:

Observe the prompt, response pairs in production, and analyze response times, cost, token usage, etc for different prompts and date ranges.
Evaluate your production responses against your own tests to get a quantifiable understanding of how well your LLM app is performing.
- For example, You can run the tests you defined against the LLM responses you are getting in production to measure how your app is performing with real data.
Filter by failure labels, severity, prompt, etc to identify different types of errors that are occurring in your LLM outputs.

See https://magiklabs.app for more details, or contact us at hello@magiklabs.app

Upcoming Features

Soon, you will also be able to:

Fail bad outputs before they get to your users.
- For example, if the LLM response contains sensitive information like PII, you can detect that in real-time, and cut it off before it reaches the end user.
Set up alerts to notify you about critical errors in production.

Platform

Contact us at hello@magiklabs.app to get access to our LLM observability platform where you can run the tests you've defined here against your LLM responses in production.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.11

Sep 25, 2023

0.2.10

Sep 20, 2023

0.2.9

Aug 10, 2023

0.2.8

Aug 10, 2023

0.2.7

Aug 3, 2023

0.2.6

Aug 2, 2023

0.2.5

Jul 31, 2023

0.2.2

Jul 26, 2023

0.2.0

Jul 20, 2023

0.1.9

Jul 19, 2023

0.1.8

Jul 17, 2023

This version

0.1.7

Jul 17, 2023

0.1.6

Jul 17, 2023

0.1.4

Jul 15, 2023

0.1.3

Jul 14, 2023

0.1.2

Jul 14, 2023

0.1.1

Jul 13, 2023

0.1.0

Jul 11, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magik-0.1.7.tar.gz (14.6 kB view hashes)

Uploaded Jul 17, 2023 Source

Built Distribution

magik-0.1.7-py3-none-any.whl (16.6 kB view hashes)

Uploaded Jul 17, 2023 Python 3

Hashes for magik-0.1.7.tar.gz

Hashes for magik-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`2fa72e9119ff4b8b7bc3f6cce4c5eafe3a8298e1a8642e12bacc391a28989b1f`
MD5	`9df28e2b68b0c7707334eeb91403069b`
BLAKE2b-256	`a90f24b9cdf225d4289201a3bd85723e6f299f0612a8cc4d9a5d3c3026441537`

Hashes for magik-0.1.7-py3-none-any.whl

Hashes for magik-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0875e231a8df9aa27b3fc6855eab598fbe12f40de328fc6824ff59f445e112e6`
MD5	`86a311fca2e0058ec5f5e599ee8d4c5b`
BLAKE2b-256	`9928ba86c674a8ea20a757ab4332a1aa770988f571b24249675096e2463d5c20`