Skip to main content

Factuality Detection for Generative AI

Project description

Factuality Detection in Generative AI

This repository contains the source code and plugin configuration for our paper Factuality Detection in Generative AI: A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios.

Factool is a tool augmented framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT).

Factool now supports 4 tasks:

  • knowledge-based QA: Factool detects factual errors in knowledge-based QA.
  • code generation: Factool detects execution errors in code generation.
  • mathematical reasoning: Factool detects calculation errors in mathematical reasoning.
  • scientific literature review: Factool detects hallucinated scientific literatures.

Installation

conda create -n factool_env python=3.11
conda activate factool_env
pip uninstall factool
pip install git+https://github.com/GAIR-NLP/factool

Quick Start

  1. Install the package.
  2. Create a file called keys.yaml and put your API keys (openai, serper, scraperapi) under it.

General Usage:

from factool  import Factool

# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-3.5-turbo")


inputs = [
            {
                "prompt": "Introduce Graham Neubig",
                "response": "Graham Neubig is a professor at MIT",
                "category": "kbqa",
                "entry_point": "answer_question",
            },
]



response_list = factool_instance.run(inputs)

print(response_list)
'''
knowledge-based QA: 
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'category': 'kbqa', 
        'claims': [claim_11, claim_12, ..., claims_1n], 
        'queries': [[query_111, query_112], [query_121, query_122], ..[query_1n1, query_1n2]], 
        'evidences': [[evidences_11], [evidences_12], ..., [evidences_1n]], 
        'claim_level_factuality': [{claim_11, reasoning_11, error_11, correction_11, factuality_11}, {claim_12, reasoning_12, error_12, correction_12, factuality_12}, ..., {claim_1n, reasoning_1n, error_1n, correction_1n, factuality_1n}], 
        'response_level_factuality': factuality_1
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'category': 'kbqa',
        'claims': [claim_21, claim_22, ..., claims_2n], 
        'queries': [[query_211, query_212], [query_221, query_222], ..., [query_2n1, query_2n2]], 
        'evidences': [[evidences_21], [evidences_22], ..., [evidences_2n]], 
        'claim_level_factuality': [{claim_21, reasoning_21, error_21, correction_21, factuality_21}, {claim_22, reasoning_22, error_22, correction_22, factuality_22}, ..., {claim_2n, reasoning_2n, error_2n, correction_2n, factuality_2n}],
        'response_level_factuality': factuality_2,
    },
    ...
]

code generation:
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'claim': claim_1,
        'category': 'code',
        'testcases_queries': [testcase_query_11, testcase_query_12, testcase_query_13], 
        'potential_solutions_queries': [potential_solution_query_11, potential_solution_query_12, potential_solution_query_13], 
        'exec_results': [[evidences_111, evidences_112, evidences_113, evidences_114], [evidences_121, evidences_122, evidences_123, evidences_124], [evidences_131, evidences_132, evidences_133, evidences_134]], 
        'claim_level_factuality': factuality_1,
        'response_level_factuality': factuality_1,
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'claim': claim_2,
        'category': 'code',
        'testcases_queries': [testcase_query_21, testcase_query_22, testcase_query_23], 
        'potential_solutions_queries': [potential_solution_query_21, potential_solution_query_22, potential_solution_query_23], 
        'exec_results': [[evidences_211, evidences_212, evidences_213, evidences_214], [evidences_221, evidences_222, evidences_223, evidences_224], [evidences_231, evidences_232, evidences_233, evidences_234]], 
        'claim_level_factuality': factuality_2,
        'response_level_factuality': factuality_2,
    },
    ...
]

mathematical problem solving: 
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'category': 'math', 
        'claims': [claim_11, claim_12, ..., claims_1n], 
        'queries': [query_11, query_12, ..., query_1n], 
        'execution_results': [exec_result_11, exec_result_12, ..., exec_result_1n],
        'claim_level_factuality': [factuality_11, factuality_12, ..., factuality_1n], 
        'response_level_factuality': factuality_1
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'category': 'math', 
        'claims': [claim_21, claim_22, ..., claims_2n], 
        'queries': [query_21, query_22, ..., query_2n], 
        'execution_results': [exec_result_21, exec_result_22, ..., exec_result_2n],
        'claim_level_factuality': [factuality_21, factuality_22, ..., factuality_2n], 
        'response_level_factuality': factuality_2
    },
    ...
]

scientific literature review:
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'category': 'scientific', 
        'claims': [claim_11, claim_12, ..., claims_1n], 
        'queries': [query_11, query_12, ..., query_1n], 
        'evidences': [evidences_11, evidences_12, ..., evidences_1n], 
        'claim_level_factuality': [{claim_11, evidence_11, error_11, factuality_11}, {claim_12, evidence_12, error_12, factuality_12}, ..., {claim_1n, evidence_1n, error_1n, factuality_1n}], 
        'response_level_factuality': factuality_1
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'category': 'scientific', 
        'claims': [claim_21, claim_22, ..., claims_2n], 
        'queries': [query_21, query_22, ..., query_2n],
        'evidences': [evidences_21, evidences_22, ..., evidences_2n], 
        'claim_level_factuality': [{claim_21, evidence_21, error_21, factuality_21}, {claim_22, evidence_22, error_22, factuality_22}, ..., {claim_2n, evidence_2n, error_2n, factuality_2n}], 
        'response_level_factuality': factuality_2
    },
    ...
]

Steps for setting up FACTOOL ChatGPT Plugin:

  1. Install the package: Installation
  2. git clone the repo: git clone https://github.com/GAIR-NLP/factool.git
  3. cd ./factool/plugin_config
  4. Create your keys.yaml
  5. Run the API locally: uvicorn main:app --host 0.0.0.0 --port ${PORT:-5003}
  6. Enter plugin store of ChatGPT Website
  7. Click 'develop your own plugin' then enter the website domain localhost: 5003 under 'domain'.

Experiments:

  1. Experimental results:
  • Exp I: .results/knowledge_QA/RoSE/

  • Exp II: .results/knowledge_QA, .results/code, .results/math, .results/scientific

  • Exp III .results/chat

  1. Get the final results and statistics for fine-grained analysis:
  • Exp I: python ./results/knowledge_QA/RoSE/run_rose_claim_extraction.py

  • Exp II: bash ./results/evaluation.sh

  • Exp III: python ./results/chat/calc_stats.py

  1. Reimplement the experiments:
  • Exp II: bash run_experiments.sh

  • Exp III: bash run_chatbot.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factool-0.0.1.tar.gz (30.9 kB view hashes)

Uploaded Source

Built Distribution

factool-0.0.1-py2.py3-none-any.whl (39.8 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page