LLM-Enhanced CheckList: AI-Powered Behavioral Testing of NLP Models

These details have not been verified by PyPI

Project links

Project description

CheckList Plus

An LLM-enhanced extension of the original CheckList framework for behavioral testing of NLP models.

This project extends the original CheckList framework with smarter and modern LLM capabilities, making it easier to create and run behavioral tests for NLP models.

🆕 What's New in CheckList Plus

🤖 LLM-Powered Text Generation & Perturbations

LLM Text Generator: Complete LLMTextGenerator class with support for OpenAI models and structured Pydantic outputs
Smart Paraphrasing: Context-aware paraphrasing with style control (formal, casual, academic) and length preferences
Intelligent Negation: LLM-powered sentence negation that preserves grammatical correctness and meaning
Entity Detection & Masking: Automatic entity detection with configurable entity types and intelligent masking capabilities
Template Completion: LLM-enhanced mask filling with contextual understanding and candidate suggestions

🎯 Enhanced Perturbations with Precision Control

Entity-Type Specific Number Changes: Target specific numerical entities using spaCy NER (MONEY, DATE, QUANTITY, CARDINAL, ORDINAL, PERCENT)
Configurable Abbreviation Handling: Optional control over changing numbers like '2' and '4' that might be abbreviations
Fallback Mechanisms: Automatic fallback from LLM to rule-based methods for reliability
Batch Processing: Efficient processing of multiple texts with structured outputs

🛠 Developer Experience Improvements

Unified API: Consistent interface across all LLM-powered features
Rich Configuration: YAML-based prompt configuration with template variable support
Comprehensive Examples: Built-in examples for entity detection and other LLM tasks
Temperature Control: Deterministic vs creative outputs with configurable temperature settings
Error Handling: Graceful degradation and comprehensive error messaging

🔄 Backward Compatibility

100% Compatible: All original CheckList functionality preserved and enhanced
Seamless Integration: New LLM features integrate naturally with existing workflows
Optional Dependencies: LLM features are optional - core functionality works without API keys

📖 Original Research

Based on the research paper:

Beyond Accuracy: Behavioral Testing of NLP models with CheckList Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh Association for Computational Linguistics (ACL), 2020

@inproceedings{checklist:acl20,
  author = {Marco Tulio Ribeiro and Tongshuang Wu and Carlos Guestrin and Sameer Singh},
  title = {Beyond Accuracy: Behavioral Testing of NLP models with CheckList},
  booktitle = {Association for Computational Linguistics (ACL)},
  year = {2020}
}

Advanced Use Cases

CheckList Plus extends behavioral testing beyond traditional NLP models to modern architectures:

Testing Embeddings Behavior - Evaluate embedding models by testing their ability to distinguish between paraphrases (should be similar) and negations (should be different). This notebook demonstrates how LLM-generated perturbations can reveal behavioral inconsistencies in embedding models.

Inspired by research on embedding evaluation methodologies: "Enhancing Negation Awareness in Universal Text Embeddings: A Data-efficient and Computational-efficient Approach"

🚀 Quick Start

Installation

pip install checklist-plus

LLM-Enhanced Features

import checklist_plus
from checklist_plus.text_generation.llm import LLMTextGenerator
from checklist_plus.perturb import LLMPerturb
from checklist_plus.editor import Editor

# Initialize LLM text generator
tg = LLMTextGenerator(openai_api_key="your-api-key", model_name="gpt-4o-mini")

# Smart paraphrasing with style control
paraphrases = tg.paraphrase(
    "The weather is nice today",
    n_paraphrases=3,
    style="formal",
    length_preference="longer",
)
# → ["Today's meteorological conditions are quite favorable",
#    "The atmospheric conditions are particularly pleasant today", ...]

# Intelligent negation
negated = tg.negate_sentence("I love this movie", n_variations=2)
# → ["I hate this movie", "I don't love this movie"]

# Entity detection and masking
result = tg.detect_and_mask_entities(
    "I bought an iPhone for $999 yesterday", entity_type="brand names"
)
# → {
#     "original_text": "I bought an iPhone for $999 yesterday",
#     "masked_text": "I bought a [MASK] for $999 yesterday",
#     "contains_entities": True,
#     "entities": ["iPhone"]
# }

# Template completion with context
completions = tg.unmask(
    "The best [MASK] for data science is [MASK]",
    context="programming tools",
    n_completions=3,
)

Enhanced Perturbations

from checklist_plus.perturb import Perturb
import spacy

nlp = spacy.load("en_core_web_sm")
data = ["The meeting is at 10:30 on Sept 14, tickets cost $45"]
parsed_data = list(nlp.pipe(data))

# Target specific entity types for number changes
ret = Perturb.perturb(
    parsed_data,
    Perturb.change_number,
    entity_types=["DATE", "MONEY"],  # Only change dates and money
    skip_abbreviations=False,  # Include numbers like '2' and '4'
    n=3,
)
# → Changes "14" to "16", "$45" to "$54", but preserves "10:30"

# LLM-powered perturbations with fallback
llm_perturb = LLMPerturb(openai_api_key="your-api-key", fallback_to_rules=True)
negated = llm_perturb.add_negation_llm(
    ["The service was excellent", "I enjoyed the meal"], n_variations=2
)

Editor with LLM Integration

# Initialize editor with LLM capabilities
editor = Editor()

# Traditional template generation (original feature)
templates = editor.template(
    "{first_name} is {a:profession} from {country}.",
    profession=["lawyer", "doctor", "accountant"],
)

# NEW: LLM-enhanced features through text generator
editor.tg = tg  # Attach LLM text generator

# Entity detection through editor
entities = editor.tg.detect_entities("Apple released the new MacBook", "brand names")
# → {"text": "Apple released the new MacBook", "contains_entities": True, "entities": ["Apple", "MacBook"]}

Key Innovations Summary

🎯 Precision Perturbations: Instead of changing all numbers, target specific entity types (MONEY, DATE, QUANTITY) with spaCy NER integration.

🤖 Structured LLM Outputs: All LLM responses use Pydantic models for type safety and consistent data structures.

🔄 Intelligent Fallbacks: LLM methods automatically fall back to rule-based approaches for reliability.

📝 Flexible Examples: New TextExample class supports structured examples with input/output/description for better prompt engineering.

🎨 Style-Aware Generation: Paraphrasing and text generation with style control (formal, casual, academic, business).

🔍 Entity Detection: LLM-powered entity detection with configurable entity types and automatic masking.

⚙️ Temperature Control: Deterministic outputs (temperature=0) for entity detection, creative outputs for paraphrasing.

Enhanced Features

Smart Perturbations: LLMPerturb for intelligent text transformations with fallback support
Structured Text Generation: LLMTextGenerator with Pydantic models for type-safe outputs
Entity-Aware Processing: Target specific numerical entities using spaCy's named entity recognition
Batch Processing: Efficient handling of multiple texts with structured responses
Configuration-Driven: YAML-based prompt templates with variable substitution

Installation

From pypi:

pip install checklist-plus
jupyter nbextension install --py --sys-prefix checklist_plus.viewer
jupyter nbextension enable --py --sys-prefix checklist_plus.viewer

Note: --sys-prefix to install into python’s sys.prefix, which is useful for instance in virtual environments, such as with conda or virtualenv. If you are not in such environments, please switch to --user to install into the user’s home jupyter directories.

From source:

git clone git@github.com:cowana-ai/checklist-plus.git
cd checklist-plus
pip install -e .

Either way, you need to install pytorch or tensorflow if you want to use masked language model suggestions:

pip install torch

For most tutorials, you also need to download a spacy model:

python -m spacy download en_core_web_sm

📚 Documentation

Tutorials

Examples from Original Paper

🔧 Advanced Installation

From PyPI (Recommended)

pip install checklist-plus

# For Jupyter visualizations
jupyter nbextension install --py --sys-prefix checklist_plus.viewer
jupyter nbextension enable --py --sys-prefix checklist_plus.viewer

From Source

git clone git@github.com:cowana-ai/checklist-plus.git
cd checklist-plus
pip install -e .

Optional Dependencies

# For masked language model suggestions
pip install torch

# For NLP processing
python -m spacy download en_core_web_sm

💡 Key Features

LLM-Enhanced Perturbations

from checklist_plus.perturb import LLMPerturb

perturb = LLMPerturb(openai_api_key="your-key")

# Advanced negation with context
negated = perturb.add_negation_llm(
    ["I love programming", "This is excellent"], n_variations=2, context="casual"
)

Enhanced Text Generation with LLM

from checklist_plus.editor import Editor

# Initialize editor with LLM capabilities
llm_editor = Editor(
    use_llm=True, model_name="gpt-4o-mini", openai_api_key="your-api-key"
)

# Smart template filling with context
templates = llm_editor.template(
    "The {mask} is very {adj}.",
    adj=["beautiful", "interesting", "amazing"],
    context="travel destinations",
    n_completions=3,
)

# LLM-powered paraphrasing
paraphrases = llm_editor.paraphrase_llm(
    "The weather is beautiful today",
    n_paraphrases=3,
    style="formal",
    length_preference="longer",
)

# Context-aware word suggestions
suggestions = llm_editor.suggest("This is a {mask} movie.", context="science fiction")

# Smart synonyms and antonyms
synonyms = llm_editor.synonyms("The food is hot.", "hot")
antonyms = llm_editor.antonyms("The weather is cold.", "cold")

Template Generation (Original Feature)

from checklist_plus.editor import Editor

editor = Editor()
ret = editor.template(
    "{first_name} is {a:profession} from {country}.",
    profession=["lawyer", "doctor", "accountant"],
)
# → ['Mary is a doctor from Afghanistan.', 'Jordan is an accountant from Indonesia.', ...]

Smart Perturbations

from checklist_plus.perturb import Perturb
import spacy

nlp = spacy.load("en_core_web_sm")
data = ["John is a doctor", "Mary is a nurse"]
parsed_data = list(nlp.pipe(data))

# Rule-based perturbations (original)
ret = Perturb.perturb(parsed_data, Perturb.change_names, n=2)

# LLM-enhanced negation
ret_llm = perturb.add_negation_llm(["The service was good", "I liked the food"])
print(ret_llm)

Test Creation and Execution

from checklist_plus.test_types import MFT, INV, DIR
from checklist_plus.expect import Expect

# Minimum Functionality Tests
test1 = MFT(
    editor.template("This is {a:adj} {mask}.", adj=["good", "great"]).data,
    labels=1,
    name="Positive sentiment",
)

# Invariance Tests
test2 = INV(**Perturb.perturb(data, Perturb.add_typos))

# Directional Expectation Tests
test3 = DIR(
    **Perturb.perturb(data, add_negative_phrase),
    expect=Expect.monotonic(label=1, increasing=False)
)

# Run tests
test1.run(wrapped_model)
test1.summary()

🔗 Resources

API Reference - Complete API documentation
Original CheckList - The foundational framework
Research Paper - Original ACL 2020 paper
Tutorial Notebooks - Step-by-step guides

🤝 Contributing

This project extends the original CheckList framework. We welcome contributions that enhance LLM integration and improve usability while maintaining backward compatibility.

📄 License

This project follows the same license as the original CheckList framework.

Note: This is an extended version of the original CheckList framework with added LLM capabilities. All original functionality is preserved and enhanced.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Sep 14, 2025

0.1.3

Sep 12, 2025

0.1.2

Sep 12, 2025

0.1.1

Sep 12, 2025

0.1.0

Sep 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

checklist_plus-0.2.0.tar.gz (47.9 MB view details)

Uploaded Sep 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

checklist_plus-0.2.0-py3-none-any.whl (12.2 MB view details)

Uploaded Sep 14, 2025 Python 3

File details

Details for the file checklist_plus-0.2.0.tar.gz.

File metadata

Download URL: checklist_plus-0.2.0.tar.gz
Upload date: Sep 14, 2025
Size: 47.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for checklist_plus-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`6c6ba6fb3dc02263c92f47fd612fc46939c2e03400d320992391511fa7a35dfb`
MD5	`06408a18f33f2986785a84581a6cec33`
BLAKE2b-256	`c8dbd815242ca79c2196071934478d9c3e1a7b2e8f06a5a16b947900dadbb78f`

See more details on using hashes here.

File details

Details for the file checklist_plus-0.2.0-py3-none-any.whl.

File metadata

Download URL: checklist_plus-0.2.0-py3-none-any.whl
Upload date: Sep 14, 2025
Size: 12.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for checklist_plus-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9dac1e9f49302808799681622faa14e53fbd16ded6bf512c61e5f8abd5f9a6f3`
MD5	`bd8548f3464b209666238918a540fb74`
BLAKE2b-256	`f3f9f2ad6f165c9589bf9ce9b81b49e2482ef370766f961249f2d5d5cc46fb19`

See more details on using hashes here.

checklist-plus 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CheckList Plus

🆕 What's New in CheckList Plus

🤖 LLM-Powered Text Generation & Perturbations

🎯 Enhanced Perturbations with Precision Control

🛠 Developer Experience Improvements

🔄 Backward Compatibility

📖 Original Research

Advanced Use Cases

🚀 Quick Start

Installation

LLM-Enhanced Features

Enhanced Perturbations

Editor with LLM Integration

Key Innovations Summary

Enhanced Features

Installation

📚 Documentation

Tutorials

Examples from Original Paper

🔧 Advanced Installation

From PyPI (Recommended)

From Source

Optional Dependencies

💡 Key Features

LLM-Enhanced Perturbations

Enhanced Text Generation with LLM

Template Generation (Original Feature)

Smart Perturbations

Test Creation and Execution

🔗 Resources

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes