langcheck

Simple, Pythonic building blocks to evaluate LLM-based applications

These details have not been verified by PyPI

Project links

repository

Project description

GitHub

Simple, Pythonic building blocks to evaluate LLM applications.

Install • Examples • Quickstart • Docs • 日本語 • 中文 • Deutsch

Install

# Install English metrics only
pip install langcheck

# Install English and Japanese metrics
pip install langcheck[ja]

# Install metrics for all languages (requires pip 21.2+)
pip install --upgrade pip
pip install langcheck[all]

Having installation issues? See the FAQ.

Examples

Evaluate Text

Use LangCheck's suite of metrics to evaluate LLM-generated text.

import langcheck

# Generate text with any LLM library
generated_outputs = [
    'Black cat the',
    'The black cat is sitting',
    'The big black cat is sitting on the fence'
]

# Check text quality and get results as a DataFrame (threshold is optional)
langcheck.metrics.fluency(generated_outputs) > 0.5

MetricValueWithThreshold screenshot

It's easy to turn LangCheck metrics into unit tests, just use assert:

assert langcheck.metrics.fluency(generated_outputs) > 0.5

LangCheck includes several types of metrics to evaluate LLM applications. Some examples:

Type of Metric	Examples	Languages
Reference-Free Text Quality Metrics	`toxicity(generated_outputs)` `sentiment(generated_outputs)` `ai_disclaimer_similarity(generated_outputs)`	EN, JA, ZH, DE
Reference-Based Text Quality Metrics	`semantic_similarity(generated_outputs, reference_outputs)` `rouge2(generated_outputs, reference_outputs)`	EN, JA, ZH, DE
Source-Based Text Quality Metrics	`factual_consistency(generated_outputs, sources)`	EN, JA, ZH, DE
Query-Based Text Quality Metrics	`answer_relevance(generated_outputs, prompts)`	EN, JA
Text Structure Metrics	`is_float(generated_outputs, min=0, max=None)` `is_json_object(generated_outputs)`	All Languages
Pairwise Text Quality Metrics	`pairwise_comparison(generated_outputs_a, generated_outputs_b, prompts)`	EN, JA

Visualize Metrics

LangCheck comes with built-in, interactive visualizations of metrics.

# Choose some metrics
fluency_values = langcheck.metrics.fluency(generated_outputs)
sentiment_values = langcheck.metrics.sentiment(generated_outputs)

# Interactive scatter plot of one metric
fluency_values.scatter()

Scatter plot for one metric

# Interactive scatter plot of two metrics
langcheck.plot.scatter(fluency_values, sentiment_values)

Scatter plot for two metrics

# Interactive histogram of a single metric
fluency_values.histogram()

Histogram for one metric

Augment Data

Text augmentations can automatically generate reworded prompts, typos, gender changes, and more to evaluate model robustness.

For example, to measure how the model responds to different genders:

male_prompts = langcheck.augment.gender(prompts, to_gender='male')
female_prompts = langcheck.augment.gender(prompts, to_gender='female')

male_generated_outputs = [my_llm_app(prompt) for prompt in male_prompts]
female_generated_outputs = [my_llm_app(prompt) for prompt in female_prompts]

langcheck.metrics.sentiment(male_generated_outputs)
langcheck.metrics.sentiment(female_generated_outputs)

Unit Testing

You can write test cases for your LLM application using LangCheck metrics.

For example, if you only have a list of prompts to test against:

from langcheck.utils import load_json

# Run the LLM application once to generate text
prompts = load_json('test_prompts.json')
generated_outputs = [my_llm_app(prompt) for prompt in prompts]

# Unit tests
def test_toxicity(generated_outputs):
    assert langcheck.metrics.toxicity(generated_outputs) < 0.1

def test_fluency(generated_outputs):
    assert langcheck.metrics.fluency(generated_outputs) > 0.9

def test_json_structure(generated_outputs):
    assert langcheck.metrics.validation_fn(
        generated_outputs, lambda x: 'myKey' in json.loads(x)).all()

Monitoring

You can monitor the quality of your LLM outputs in production with LangCheck metrics.

Just save the outputs and pass them into LangCheck.

production_outputs = load_json('llm_logs_2023_10_02.json')['outputs']

# Evaluate and display toxic outputs in production logs
langcheck.metrics.toxicity(production_outputs) > 0.75

# Or if your app outputs structured text
langcheck.metrics.is_json_array(production_outputs)

Guardrails

You can provide guardrails on LLM outputs with LangCheck metrics.

Just filter candidate outputs through LangCheck.

# Get a candidate output from the LLM app
raw_output = my_llm_app(random_user_prompt)

# Filter the output before it reaches the user
while langcheck.metrics.contains_any_strings(raw_output, blacklist_words).any():
    raw_output = my_llm_app(random_user_prompt)

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

0.10.0.dev13 pre-release

Nov 5, 2025

0.10.0.dev12 pre-release

Nov 3, 2025

0.10.0.dev11 pre-release

Oct 28, 2025

0.10.0.dev10 pre-release

Sep 30, 2025

0.10.0.dev9 pre-release

Sep 10, 2025

0.10.0.dev8 pre-release

Aug 20, 2025

0.10.0.dev7 pre-release

Jul 8, 2025

0.10.0.dev6 pre-release

Jun 12, 2025

0.10.0.dev5 pre-release

May 29, 2025

0.10.0.dev4 pre-release

May 8, 2025

0.10.0.dev3 pre-release

Apr 22, 2025

0.10.0.dev2 pre-release

Apr 3, 2025

0.10.0.dev1 pre-release

Feb 13, 2025

0.9.0

Dec 12, 2024

0.9.0.dev1 pre-release

Feb 13, 2025

This version

0.8.1

Dec 5, 2024

0.8.0

Oct 29, 2024

0.8.0.dev6 pre-release

Aug 23, 2024

0.8.0.dev5 pre-release

Jul 12, 2024

0.8.0.dev4 pre-release

Jul 10, 2024

0.8.0.dev3 pre-release

Jun 18, 2024

0.8.0.dev2 pre-release

Jun 13, 2024

0.8.0.dev1 pre-release

May 14, 2024

0.7.1

May 8, 2024

0.7.0 yanked

May 8, 2024

Reason this release was yanked:

Bug affects OpenAI JA and DE metrics

0.6.0

Apr 8, 2024

0.5.0

Mar 11, 2024

0.4.0

Jan 22, 2024

0.3.0

Dec 6, 2023

0.2.0

Nov 8, 2023

0.1.0

Oct 11, 2023

0.0.6

Oct 10, 2023

0.0.5

Sep 27, 2023

0.0.4

Sep 27, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langcheck-0.8.1.tar.gz (102.1 kB view details)

Uploaded Dec 5, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langcheck-0.8.1-py3-none-any.whl (167.9 kB view details)

Uploaded Dec 5, 2024 Python 3

File details

Details for the file langcheck-0.8.1.tar.gz.

File metadata

Download URL: langcheck-0.8.1.tar.gz
Upload date: Dec 5, 2024
Size: 102.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.19

File hashes

Hashes for langcheck-0.8.1.tar.gz
Algorithm	Hash digest
SHA256	`d080f693bbb7914aa10bca4b688a7c665440a83d6ea390252b249a126f402b89`
MD5	`ddf4365c5433d023e5272c26b569d164`
BLAKE2b-256	`bd902af9717701b7497bedaf4a4e416624cf5cc8a2481c1cf652a1e684777c30`

See more details on using hashes here.

File details

Details for the file langcheck-0.8.1-py3-none-any.whl.

File metadata

Download URL: langcheck-0.8.1-py3-none-any.whl
Upload date: Dec 5, 2024
Size: 167.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.19

File hashes

Hashes for langcheck-0.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`27c48c35edfa9b0218ce9253936407f4d5b08faa8bfeac885d22a95e1e66270b`
MD5	`17d2cb53632cc626681a205dbd4c3e91`
BLAKE2b-256	`40849244efc313175a5780c1de45b8f4b4c95c1008fe44c4c6cbb0f7fd5ab3d5`

See more details on using hashes here.

langcheck 0.8.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install

Examples

Evaluate Text

Visualize Metrics

Augment Data

Unit Testing

Monitoring

Guardrails

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes