Skip to main content

Factuality Detection for Generative AI

Project description

Factuality Detection in Generative AI

This repository contains the source code and plugin configuration for our paper Factuality Detection in Generative AI: A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios.

Factool is a tool augmented framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT).

Factool now supports 4 tasks:

  • knowledge-based QA: Factool detects factual errors in knowledge-based QA.
  • code generation: Factool detects execution errors in code generation.
  • mathematical reasoning: Factool detects calculation errors in mathematical reasoning.
  • scientific literature review: Factool detects hallucinated scientific literatures.

Installation

conda create -n factool_env python=3.11
conda activate factool_env
pip uninstall factool
pip install git+https://github.com/GAIR-NLP/factool

Quick Start

  1. Install the package.
  2. Create a file called keys.yaml and put your API keys (openai, serper, scraperapi) under it.

General Usage:

from factool  import Factool

# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-3.5-turbo")


inputs = [
            {
                "prompt": "Introduce Graham Neubig",
                "response": "Graham Neubig is a professor at MIT",
                "category": "kbqa",
                "entry_point": "answer_question",
            },
]



response_list = factool_instance.run(inputs)

print(response_list)
'''
knowledge-based QA: 
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'category': 'kbqa', 
        'claims': [claim_11, claim_12, ..., claims_1n], 
        'queries': [[query_111, query_112], [query_121, query_122], ..[query_1n1, query_1n2]], 
        'evidences': [[evidences_11], [evidences_12], ..., [evidences_1n]], 
        'claim_level_factuality': [{claim_11, reasoning_11, error_11, correction_11, factuality_11}, {claim_12, reasoning_12, error_12, correction_12, factuality_12}, ..., {claim_1n, reasoning_1n, error_1n, correction_1n, factuality_1n}], 
        'response_level_factuality': factuality_1
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'category': 'kbqa',
        'claims': [claim_21, claim_22, ..., claims_2n], 
        'queries': [[query_211, query_212], [query_221, query_222], ..., [query_2n1, query_2n2]], 
        'evidences': [[evidences_21], [evidences_22], ..., [evidences_2n]], 
        'claim_level_factuality': [{claim_21, reasoning_21, error_21, correction_21, factuality_21}, {claim_22, reasoning_22, error_22, correction_22, factuality_22}, ..., {claim_2n, reasoning_2n, error_2n, correction_2n, factuality_2n}],
        'response_level_factuality': factuality_2,
    },
    ...
]

code generation:
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'claim': claim_1,
        'category': 'code',
        'testcases_queries': [testcase_query_11, testcase_query_12, testcase_query_13], 
        'potential_solutions_queries': [potential_solution_query_11, potential_solution_query_12, potential_solution_query_13], 
        'exec_results': [[evidences_111, evidences_112, evidences_113, evidences_114], [evidences_121, evidences_122, evidences_123, evidences_124], [evidences_131, evidences_132, evidences_133, evidences_134]], 
        'claim_level_factuality': factuality_1,
        'response_level_factuality': factuality_1,
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'claim': claim_2,
        'category': 'code',
        'testcases_queries': [testcase_query_21, testcase_query_22, testcase_query_23], 
        'potential_solutions_queries': [potential_solution_query_21, potential_solution_query_22, potential_solution_query_23], 
        'exec_results': [[evidences_211, evidences_212, evidences_213, evidences_214], [evidences_221, evidences_222, evidences_223, evidences_224], [evidences_231, evidences_232, evidences_233, evidences_234]], 
        'claim_level_factuality': factuality_2,
        'response_level_factuality': factuality_2,
    },
    ...
]

mathematical problem solving: 
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'category': 'math', 
        'claims': [claim_11, claim_12, ..., claims_1n], 
        'queries': [query_11, query_12, ..., query_1n], 
        'execution_results': [exec_result_11, exec_result_12, ..., exec_result_1n],
        'claim_level_factuality': [factuality_11, factuality_12, ..., factuality_1n], 
        'response_level_factuality': factuality_1
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'category': 'math', 
        'claims': [claim_21, claim_22, ..., claims_2n], 
        'queries': [query_21, query_22, ..., query_2n], 
        'execution_results': [exec_result_21, exec_result_22, ..., exec_result_2n],
        'claim_level_factuality': [factuality_21, factuality_22, ..., factuality_2n], 
        'response_level_factuality': factuality_2
    },
    ...
]

scientific literature review:
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'category': 'scientific', 
        'claims': [claim_11, claim_12, ..., claims_1n], 
        'queries': [query_11, query_12, ..., query_1n], 
        'evidences': [evidences_11, evidences_12, ..., evidences_1n], 
        'claim_level_factuality': [{claim_11, evidence_11, error_11, factuality_11}, {claim_12, evidence_12, error_12, factuality_12}, ..., {claim_1n, evidence_1n, error_1n, factuality_1n}], 
        'response_level_factuality': factuality_1
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'category': 'scientific', 
        'claims': [claim_21, claim_22, ..., claims_2n], 
        'queries': [query_21, query_22, ..., query_2n],
        'evidences': [evidences_21, evidences_22, ..., evidences_2n], 
        'claim_level_factuality': [{claim_21, evidence_21, error_21, factuality_21}, {claim_22, evidence_22, error_22, factuality_22}, ..., {claim_2n, evidence_2n, error_2n, factuality_2n}], 
        'response_level_factuality': factuality_2
    },
    ...
]

Steps for setting up FACTOOL ChatGPT Plugin:

  1. Install the package: Installation
  2. git clone the repo: git clone https://github.com/GAIR-NLP/factool.git
  3. cd ./factool/plugin_config
  4. Create your keys.yaml
  5. Run the API locally: uvicorn main:app --host 0.0.0.0 --port ${PORT:-5003}
  6. Enter plugin store of ChatGPT Website
  7. Click 'develop your own plugin' then enter the website domain localhost: 5003 under 'domain'.

Experiments:

  1. Experimental results:
  • Exp I: .results/knowledge_QA/RoSE/

  • Exp II: .results/knowledge_QA, .results/code, .results/math, .results/scientific

  • Exp III .results/chat

  1. Get the final results and statistics for fine-grained analysis:
  • Exp I: python ./results/knowledge_QA/RoSE/run_rose_claim_extraction.py

  • Exp II: bash ./results/evaluation.sh

  • Exp III: python ./results/chat/calc_stats.py

  1. Reimplement the experiments:
  • Exp II: bash run_experiments.sh

  • Exp III: bash run_chatbot.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factool-0.0.1.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

factool-0.0.1-py2.py3-none-any.whl (39.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file factool-0.0.1.tar.gz.

File metadata

  • Download URL: factool-0.0.1.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for factool-0.0.1.tar.gz
Algorithm Hash digest
SHA256 988b962242eb8a3d435973d8d215bbb4f94b5b9befc906c2a143b43119181cde
MD5 75db757f66b79cea8c85039d6712a3cf
BLAKE2b-256 2b2fdec71ba016fd7974ee279e7da58e80af6e391097ca122c23e572ad681f2b

See more details on using hashes here.

File details

Details for the file factool-0.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: factool-0.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for factool-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2b9d48d8d53b4bad42020d6609956f72a48bd8d474cb2b30d5e0e1458dda8f0e
MD5 a444527dd892d4f497283c4c734ed2e5
BLAKE2b-256 edd633882f681222a38e5c39047361c973d41766b3e9575e165548de117069d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page