Skip to main content

LLM-Enhanced CheckList: AI-Powered Behavioral Testing of NLP Models

Project description

CheckList Plus

An LLM-enhanced extension of the original CheckList framework for behavioral testing of NLP models.

This project extends the original CheckList framework with modern LLM capabilities, making it easier to create and run behavioral tests for NLP models.

🆕 What's New in CheckList Plus

  • LLM-Powered Negation: Generate text negations using OpenAI GPT models
  • Enhanced Text Generation: Advanced paraphrasing, context-aware suggestions, and semantic word relations
  • Smart Template Filling: LLM-enhanced template completion with contextual understanding
  • Intelligent Word Relations: Context-aware synonyms, antonyms, hypernyms, and hyponyms
  • Simplified API: More intuitive interfaces for common testing scenarios
  • Backward Compatibility: Works with all original CheckList functionality

📖 Original Research

Based on the research paper:

Beyond Accuracy: Behavioral Testing of NLP models with CheckList Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh Association for Computational Linguistics (ACL), 2020

@inproceedings{checklist:acl20,
  author = {Marco Tulio Ribeiro and Tongshuang Wu and Carlos Guestrin and Sameer Singh},
  title = {Beyond Accuracy: Behavioral Testing of NLP models with CheckList},
  booktitle = {Association for Computational Linguistics (ACL)},
  year = {2020}
}

🚀 Quick Start

Installation

pip install checklist-plus

Basic Usage

import checklist_plus
from checklist_plus.perturb import LLMPerturb
from checklist_plus.editor import Editor

# Initialize LLM-enhanced perturbations
perturb = LLMPerturb(openai_api_key="your-api-key")
data = ["I love this movie", "The food was great"]

# LLM-powered negation
negated = perturb.add_negation_llm(data, n_variations=2)
# → [["I hate this movie", "I don't love this movie"], ...]

# Initialize LLM-enhanced text generation
editor = Editor(use_llm=True, openai_api_key="your-api-key")

# Smart paraphrasing
paraphrases = editor.paraphrase_llm(
    "The weather is nice today", n_paraphrases=2, style="formal"
)
# → ["Today's weather conditions are quite pleasant", "The meteorological conditions are favorable today"]

Enhanced Features

  • Smart Perturbations: LLMPerturb for intelligent text transformations
  • LLM-powered Text Generation: Context-aware template filling and paraphrasing
  • Intelligent Word Relations: Smart synonyms, antonyms, and semantic suggestions
  • Batch Processing: Efficient handling of multiple texts
  • Fallback Support: Automatic fallback to rule-based methods

Installation

From pypi:

pip install checklist-plus
jupyter nbextension install --py --sys-prefix checklist_plus.viewer
jupyter nbextension enable --py --sys-prefix checklist_plus.viewer

Note: --sys-prefix to install into python’s sys.prefix, which is useful for instance in virtual environments, such as with conda or virtualenv. If you are not in such environments, please switch to --user to install into the user’s home jupyter directories.

From source:

git clone git@github.com:cowana-ai/checklist-plus.git
cd checklist-plus
pip install -e .

Either way, you need to install pytorch or tensorflow if you want to use masked language model suggestions:

pip install torch

For most tutorials, you also need to download a spacy model:

python -m spacy download en_core_web_sm

📚 Documentation

Tutorials

  1. Generating data
  2. Perturbing data (with LLM enhancements)
  3. Test types and expectation functions
  4. The CheckList Plus process

Examples from Original Paper

🔧 Advanced Installation

From PyPI (Recommended)

pip install checklist-plus

# For Jupyter visualizations
jupyter nbextension install --py --sys-prefix checklist_plus.viewer
jupyter nbextension enable --py --sys-prefix checklist_plus.viewer

From Source

git clone git@github.com:cowana-ai/checklist-plus.git
cd checklist-plus
pip install -e .

Optional Dependencies

# For masked language model suggestions
pip install torch

# For NLP processing
python -m spacy download en_core_web_sm

💡 Key Features

LLM-Enhanced Perturbations

from checklist_plus.perturb import LLMPerturb

perturb = LLMPerturb(openai_api_key="your-key")

# Advanced negation with context
negated = perturb.add_negation_llm(
    ["I love programming", "This is excellent"], n_variations=2, context="casual"
)

Enhanced Text Generation with LLM

from checklist_plus.editor import Editor

# Initialize editor with LLM capabilities
llm_editor = Editor(
    use_llm=True, model_name="gpt-4o-mini", openai_api_key="your-api-key"
)

# Smart template filling with context
templates = llm_editor.template(
    "The {mask} is very {adj}.",
    adj=["beautiful", "interesting", "amazing"],
    context="travel destinations",
    n_completions=3,
)

# LLM-powered paraphrasing
paraphrases = llm_editor.paraphrase_llm(
    "The weather is beautiful today",
    n_paraphrases=3,
    style="formal",
    length_preference="longer",
)

# Context-aware word suggestions
suggestions = llm_editor.suggest("This is a {mask} movie.", context="science fiction")

# Smart synonyms and antonyms
synonyms = llm_editor.synonyms("The food is hot.", "hot")
antonyms = llm_editor.antonyms("The weather is cold.", "cold")

Template Generation (Original Feature)

from checklist_plus.editor import Editor

editor = Editor()
ret = editor.template(
    "{first_name} is {a:profession} from {country}.",
    profession=["lawyer", "doctor", "accountant"],
)
# → ['Mary is a doctor from Afghanistan.', 'Jordan is an accountant from Indonesia.', ...]

Smart Perturbations

from checklist_plus.perturb import Perturb
import spacy

nlp = spacy.load("en_core_web_sm")
data = ["John is a doctor", "Mary is a nurse"]
parsed_data = list(nlp.pipe(data))

# Rule-based perturbations (original)
ret = Perturb.perturb(parsed_data, Perturb.change_names, n=2)

# LLM-enhanced negation
ret_llm = perturb.add_negation_llm(["The service was good", "I liked the food"])
print(ret_llm)

Test Creation and Execution

from checklist_plus.test_types import MFT, INV, DIR
from checklist_plus.expect import Expect

# Minimum Functionality Tests
test1 = MFT(
    editor.template("This is {a:adj} {mask}.", adj=["good", "great"]).data,
    labels=1,
    name="Positive sentiment",
)

# Invariance Tests
test2 = INV(**Perturb.perturb(data, Perturb.add_typos))

# Directional Expectation Tests
test3 = DIR(
    **Perturb.perturb(data, add_negative_phrase),
    expect=Expect.monotonic(label=1, increasing=False)
)

# Run tests
test1.run(wrapped_model)
test1.summary()

🔗 Resources

🤝 Contributing

This project extends the original CheckList framework. We welcome contributions that enhance LLM integration and improve usability while maintaining backward compatibility.

📄 License

This project follows the same license as the original CheckList framework.


Note: This is an extended version of the original CheckList framework with added LLM capabilities. All original functionality is preserved and enhanced.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

checklist_plus-0.1.0.tar.gz (47.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

checklist_plus-0.1.0-py3-none-any.whl (12.2 MB view details)

Uploaded Python 3

File details

Details for the file checklist_plus-0.1.0.tar.gz.

File metadata

  • Download URL: checklist_plus-0.1.0.tar.gz
  • Upload date:
  • Size: 47.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for checklist_plus-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e43d5d805549f4c684865580b0289088bcb2c8c33a0962d13e1cda20fe154897
MD5 e50795ad5c4f6c732082d2a447f8580d
BLAKE2b-256 07a6e0855783d091f658fc6eff5050838ee9e34522fba924726b82a89bb1b1d5

See more details on using hashes here.

File details

Details for the file checklist_plus-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: checklist_plus-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for checklist_plus-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8463dd37de7432ea3f567d4b1c2b218c8c841509bec96eef162ca3c89273f1e5
MD5 1d1810322e1913ef32d4146f0497f41e
BLAKE2b-256 da69ef62f5454af7241c0ae9ba5918e8a0ba840a8eeadef367b70fcbd486125a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page