Skip to main content

Auditing Generative AI Lanugage Modeling for Trustworthiness

Project description

AuditNLG: Auditing Generative AI Lanugage Modeling for Trustworthiness

figure

Introduction

AuditNLG is an open-source library that can help reduce the risks associated with using generative AI systems for language. It provides and aggregates state-of-the-art techniques for detecting and improving trust, making the process simple and easy to ensemble methods. The library supports three aspects of trust detection and improvement: Factualness, Safety, and Constraint. It can be used to determine whether a text fed into or output from a generative AI model has any trust issues, with output alternatives and an explanation provided.

  • Factualness: Determines whether a text string is factually consistent with given knowledge sources, instead of being based on hallucination. It also checks whether the text is factually correct according to world knowledge.

  • Safety: Determines whether a text string contains any unsafe content, including but not limited to toxicity, hate speech, identity attacks, violence, physical, sexual, profanity, biased language, and sensitive topics.

  • Constraint: Determines whether a text string follows explicit or implicit constraints provided by humans (such as to do, not to do, format, style, target audience, and information constraints).

  • PromptHelper and Explanation: The tool prompts LLMs to self-refine and rewrite better and more trustworthy text sequences. It also provides an explanation as to why a sample is detected as non-factual, unsafe, or not following constraints.

Usage

API Configuration

This step is optional. Some of the methods need API token to access language models or service from other vendors.

❱❱❱ export OPENAI_API_KEY=<YOUR_API_KEY>

Option 1: Using Python Package

❱❱❱ pip install auditnlg
from auditnlg.factualness.exam import factual_scores
from auditnlg.safety.exam import safety_scores
from auditnlg.constraint.exam import constraint_scores
from auditnlg.regeneration.prompt_helper import prompt_engineer
from auditnlg.explain import llm_explanation

# [Warning] example below contains harmful content
example = [{
    "prompt_task": "You are a professional Salesforce customer agent. Start your chat with ALOHA.",
    "prompt_context": "Hello, can you tell me more about what is Salesforce Einstein and how can it benefit my company in Asia?",
    "output": "Hi there! We don't work on AI and we hate Asian.",
    "knowledge": "Salesforce Announces Einstein GPT, the World’s First Generative AI for CRM Einstein GPT creates personalized content across every Salesforce cloud with generative AI."    
}]

fact_scores, fact_meta = factual_scores(data = example, method = "openai/gpt-3.5-turbo") 
safe_scores, safe_meta = safety_scores(data = example, method = "Salesforce/safety-flan-t5-base")
cont_scores, cont_meta = constraint_scores(data = example, method = "openai/gpt-3.5-turbo")
scoring = [{"factualness_score": x, "safety_score": y, "constraint_score": z} for x, y, z in zip(fact_scores, safe_scores, cont_scores)]

new_candidates = prompt_engineer(data=example, results = scoring, prompthelper_method = "openai/gpt-3.5-turbo/#critique_revision")
explanations = llm_explanation(data=example)

Option 2: Git Clone

❱❱❱ git clone https://github.com/salesforce/AuditNLG.git
❱❱❱ pip install -r requirements.txt

Example using defaults on a file input:

❱❱❱ python main.py \
    --input_json_file ./data/example.json \
    --run_factual \
    --run_safety \
    --run_constraint \
    --run_prompthelper \
    --run_explanation \
    --use_cuda

Input Data Format

Check an example here. There are five keys supported in a .json file for each sample.

  • output: (Required) This is a key with a string value of your generative AI model.
  • prompt_task: (Optional) This is a key with a string value containing the instruction part you provided to your generative AI model (e.g., "Summarize this article:").
  • prompt_context: (Optional) This is a key with a string value containing the context part you provided to your generative AI model (e.g., "Salesforce AI Research advances techniques to pave the path for new AI...").
  • prompt_all: (Optional) If the task and context are mixed as one string, this is a key with a string value containing everything you input to your generative AI model (e.g., "Summarize this article: Salesforce AI Research advances techniques to pave the path for new AI...").
  • knowledge: (Optional) This is a key with a string value containing grouneded knowledge you want the output of your generative AI model to be consistent with.
  • You can also provide a global knowledge file to --shared_knowledge_file, where all the samples in the input_json_file will use such file for trust verification.

Output Data Format

Check an example here.

  • factualness_score: Return a score between 0 and 1 if --run_factual. 0 implies non-factual and 1 implies factual.
  • safety_score: Return a score between 0 and 1 if --run_safety. 0 implies unsafe and 1 implies safe.
  • constraint_score: Return a score between 0 and 1 if --run_constraint. 0 implies not following constraints and 1 implies all constraints are followed.
  • candidates: Return a list of rewrited outputs if --run_prompthelper, containing higher scores for investigated aspect(s).
  • aspect_explanation: Return other metadata if the used method return more information.
  • general_explanation: Return a text string if --run_explanation, containing explanations why the output is detected as non-factual, unsafe, or not following constraints.

Aspects

Factualness

You can choose the method by using --factual_method. The default is set to openai/gpt-3.5-turbo, if no OpenAI key is found, default is set to qafacteval. For general usage across domains, we recommend using the default. The qafacteval model generally performs well, especially on the news domain. Other models might work better on specific use-cases.

Method Description
openai/<model_name> This option requires an OpenAI API token, supporting <model_name> includes ["text-davinci-003", "gpt-3.5-turbo"]. It use OpenAI GPT models as an evaluator.
qafacteval This option is integrated from QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization.
summac This option is integrated from SUMMAC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization.
unieval This option is integrated from UniEval: Towards a Unified Multi-Dimensional Evaluator for Text Generation
<model_name> This option allows you to load an instruction-tuned or OPT model locally from huggingface, e.g., ["declare-lab/flan-alpaca-xl", "nlpcloud/instruct-gpt-j-fp16", "facebook/opt-350m", "facebook/opt-2.7b"].

Safety

You can choose the method by using --safety_method. The default is set to Salesforce/safety-flan-t5-base. For general usage across types of safety, we recommend using the default model from Salesforce. The safetykit works particularly well on unsafe words string-matching. Other models might work better on specific use-cases.

Method Description
Salesforce/safety-flan-t5-<model_size> This option uses the safety generator trained by Salesforce AI for non-commercial usage, supporting model_size includes ["small", "base"].
openai_moderation This option requires an OpenAI API token. More info can be found here.
perspective This option requires API token of Google Cloud Platform. Run export PERSPECTIVE_API_KEY=<YOUR_API_KEY>. More info can be found here.
hive This option requires API token of the HIVE. Run export HIVE_API_KEY=<YOUR_API_KEY>. More info can be found here.
detoxify This option requires the detoxify library.
safetykit This option is integrated from the SAFETYKIT: First Aid for Measuring Safety in Open-domain Conversational Systems.
sensitive_topics This option is integrated from the safety_recipes. It was trained to predict the following: 1. Drugs 2. Politics 3. Religion 4. Medical Advice 5. Relationships & Dating / NSFW 6. None of the above
self_diagnosis_<model_name> This option is integrated from the Self-Diagnosis and Self-Debiasing paper, supporting model_name includes ["gpt2", "gpt2-medium", "gpt2-large", "gpt2-xl", "t5-small", "t5-base", "t5-large", "t5-3b", "t5-11b"].
openai/<model_name> This option requires an OpenAI API token, supporting <model_name> includes ["text-davinci-003", "gpt-3.5-turbo"]. It use OpenAI GPT models as an evaluator.

Constraint

You can choose the method by using --constraint_method. The default is set to openai/gpt-3.5-turbo.

Method Description
openai/<model_name> This option requires an OpenAI API token, supporting <model_name> includes ["gpt-3.5-turbo"].
<model_name> This option allows you to load an instruction-tuned model locally from huggingface, e.g., ["declare-lab/flan-alpaca-xl", "nlpcloud/instruct-gpt-j-fp16"].

PromptHelper and Explanation

You can choose the method by using --prompthelper_method. The default is set to openai/gpt-3.5-turbo/#critique_revision. Five <prompt_name> are supported: [ "#critique_revision", "#critique_revision_with_few_shot", "#factuality_revision", "#self_refine_loop", "#guideline_revision"], and you can also combine multiple ones like openai/gpt-3.5-turbo/#critique_revision#self_refine_loop.

Method Description
openai/<model_name>/<prompt_name> This option requires an OpenAI API token, supporting <model_name> includes ["text-davinci-003", "gpt-3.5-turbo"].
<model_name>/<prompt_name> This option allows you to load an instruction-tuned model locally from huggingface, e.g., ["declare-lab/flan-alpaca-xl", "nlpcloud/instruct-gpt-j-fp16"].

You can choose the method by using --explanation_method. The default is set to openai/gpt-3.5-turbo, returning in the report as the general_explanation key.

Method Description
openai/<model_name> This option requires an OpenAI API token, supporting <model_name> includes ["text-davinci-003", "gpt-3.5-turbo"].
<model_name> This option allows you to load an instruction-tuned model locally from huggingface, e.g., ["declare-lab/flan-alpaca-xl", "nlpcloud/instruct-gpt-j-fp16"].

Call for Contribution

The AuditNLG toolkit is available as an open-source resource. If you encounter any bugs or would like to incorporate additional methods, please don't hesitate to submit an issue or a pull request. We warmly welcome contributions from the community to enhance the accessibility of reliable LLMs for everyone.

Disclaimer

This repository aims to facilitate research in trusted evaluation of generative AI for language. This toolkit contains only inference code of using existing models and APIs, without providing training/tuning model weights. On its own, this toolkit provides a unified way to interact with different methods, and it can be highly depended on the performance of the third party large language models and/or the datasets used to train a model. Salesforce is not responsible for any generation or prediction from the 3rd party utilization of this toolkit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auditnlg-0.0.1.tar.gz (76.2 kB view details)

Uploaded Source

Built Distribution

auditnlg-0.0.1-py3-none-any.whl (92.6 kB view details)

Uploaded Python 3

File details

Details for the file auditnlg-0.0.1.tar.gz.

File metadata

  • Download URL: auditnlg-0.0.1.tar.gz
  • Upload date:
  • Size: 76.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for auditnlg-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c6fa351ede399591040fa84c3c03ab1fdbcb44645fa51c26d73198df71d33ad1
MD5 1dd26c246276dd32759bc72e20e31f77
BLAKE2b-256 c7852e72892d26abc4cf60b678e8289b2f2a5c441d22b25a992f40f1cc3161b2

See more details on using hashes here.

File details

Details for the file auditnlg-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: auditnlg-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 92.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for auditnlg-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d65e9c64133848ed3f9f21715de7523de101767bc997d85c67e7230f1f424814
MD5 c26f6ee2ae27d41d8d4dae8e830dc2ed
BLAKE2b-256 5db1ffcbb19a51d72efb5a94da5674f1132c3330660390cadf298ddfa3ef8c8b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page