Factuality Detection for Generative AI
Project description
Factuality Detection in Generative AI
This repository contains the source code and plugin configuration for our paper Factuality Detection in Generative AI: A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios.
Factool is a tool augmented framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT).
Factool now supports 4 tasks:
- knowledge-based QA: Factool detects factual errors in knowledge-based QA.
- code generation: Factool detects execution errors in code generation.
- mathematical reasoning: Factool detects calculation errors in mathematical reasoning.
- scientific literature review: Factool detects hallucinated scientific literatures.
Installation
conda create -n factool_env python=3.11
conda activate factool_env
pip uninstall factool
pip install git+https://github.com/GAIR-NLP/factool
Quick Start
- Install the package.
- Create a file called keys.yaml and put your API keys (openai, serper, scraperapi) under it.
General Usage:
from factool import Factool
# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-3.5-turbo")
inputs = [
{
"prompt": "Introduce Graham Neubig",
"response": "Graham Neubig is a professor at MIT",
"category": "kbqa",
"entry_point": "answer_question",
},
]
response_list = factool_instance.run(inputs)
print(response_list)
'''
knowledge-based QA:
response_list =
[
{
'prompt': prompt_1,
'response': response_1,
'category': 'kbqa',
'claims': [claim_11, claim_12, ..., claims_1n],
'queries': [[query_111, query_112], [query_121, query_122], ..[query_1n1, query_1n2]],
'evidences': [[evidences_11], [evidences_12], ..., [evidences_1n]],
'claim_level_factuality': [{claim_11, reasoning_11, error_11, correction_11, factuality_11}, {claim_12, reasoning_12, error_12, correction_12, factuality_12}, ..., {claim_1n, reasoning_1n, error_1n, correction_1n, factuality_1n}],
'response_level_factuality': factuality_1
},
{
'prompt': prompt_2,
'response': response_2,
'category': 'kbqa',
'claims': [claim_21, claim_22, ..., claims_2n],
'queries': [[query_211, query_212], [query_221, query_222], ..., [query_2n1, query_2n2]],
'evidences': [[evidences_21], [evidences_22], ..., [evidences_2n]],
'claim_level_factuality': [{claim_21, reasoning_21, error_21, correction_21, factuality_21}, {claim_22, reasoning_22, error_22, correction_22, factuality_22}, ..., {claim_2n, reasoning_2n, error_2n, correction_2n, factuality_2n}],
'response_level_factuality': factuality_2,
},
...
]
code generation:
response_list =
[
{
'prompt': prompt_1,
'response': response_1,
'claim': claim_1,
'category': 'code',
'testcases_queries': [testcase_query_11, testcase_query_12, testcase_query_13],
'potential_solutions_queries': [potential_solution_query_11, potential_solution_query_12, potential_solution_query_13],
'exec_results': [[evidences_111, evidences_112, evidences_113, evidences_114], [evidences_121, evidences_122, evidences_123, evidences_124], [evidences_131, evidences_132, evidences_133, evidences_134]],
'claim_level_factuality': factuality_1,
'response_level_factuality': factuality_1,
},
{
'prompt': prompt_2,
'response': response_2,
'claim': claim_2,
'category': 'code',
'testcases_queries': [testcase_query_21, testcase_query_22, testcase_query_23],
'potential_solutions_queries': [potential_solution_query_21, potential_solution_query_22, potential_solution_query_23],
'exec_results': [[evidences_211, evidences_212, evidences_213, evidences_214], [evidences_221, evidences_222, evidences_223, evidences_224], [evidences_231, evidences_232, evidences_233, evidences_234]],
'claim_level_factuality': factuality_2,
'response_level_factuality': factuality_2,
},
...
]
mathematical problem solving:
response_list =
[
{
'prompt': prompt_1,
'response': response_1,
'category': 'math',
'claims': [claim_11, claim_12, ..., claims_1n],
'queries': [query_11, query_12, ..., query_1n],
'execution_results': [exec_result_11, exec_result_12, ..., exec_result_1n],
'claim_level_factuality': [factuality_11, factuality_12, ..., factuality_1n],
'response_level_factuality': factuality_1
},
{
'prompt': prompt_2,
'response': response_2,
'category': 'math',
'claims': [claim_21, claim_22, ..., claims_2n],
'queries': [query_21, query_22, ..., query_2n],
'execution_results': [exec_result_21, exec_result_22, ..., exec_result_2n],
'claim_level_factuality': [factuality_21, factuality_22, ..., factuality_2n],
'response_level_factuality': factuality_2
},
...
]
scientific literature review:
response_list =
[
{
'prompt': prompt_1,
'response': response_1,
'category': 'scientific',
'claims': [claim_11, claim_12, ..., claims_1n],
'queries': [query_11, query_12, ..., query_1n],
'evidences': [evidences_11, evidences_12, ..., evidences_1n],
'claim_level_factuality': [{claim_11, evidence_11, error_11, factuality_11}, {claim_12, evidence_12, error_12, factuality_12}, ..., {claim_1n, evidence_1n, error_1n, factuality_1n}],
'response_level_factuality': factuality_1
},
{
'prompt': prompt_2,
'response': response_2,
'category': 'scientific',
'claims': [claim_21, claim_22, ..., claims_2n],
'queries': [query_21, query_22, ..., query_2n],
'evidences': [evidences_21, evidences_22, ..., evidences_2n],
'claim_level_factuality': [{claim_21, evidence_21, error_21, factuality_21}, {claim_22, evidence_22, error_22, factuality_22}, ..., {claim_2n, evidence_2n, error_2n, factuality_2n}],
'response_level_factuality': factuality_2
},
...
]
Steps for setting up FACTOOL ChatGPT Plugin:
- Install the package: Installation
- git clone the repo:
git clone https://github.com/GAIR-NLP/factool.git
cd ./factool/plugin_config
- Create your
keys.yaml
- Run the API locally:
uvicorn main:app --host 0.0.0.0 --port ${PORT:-5003}
- Enter plugin store of ChatGPT Website
- Click 'develop your own plugin' then enter the website domain
localhost: 5003
under 'domain'.
Experiments:
- Experimental results:
-
Exp I:
.results/knowledge_QA/RoSE/
-
Exp II:
.results/knowledge_QA
,.results/code
,.results/math
,.results/scientific
-
Exp III
.results/chat
- Get the final results and statistics for fine-grained analysis:
-
Exp I:
python ./results/knowledge_QA/RoSE/run_rose_claim_extraction.py
-
Exp II:
bash ./results/evaluation.sh
-
Exp III:
python ./results/chat/calc_stats.py
- Reimplement the experiments:
-
Exp II:
bash run_experiments.sh
-
Exp III:
bash run_chatbot.sh
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for factool-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b9d48d8d53b4bad42020d6609956f72a48bd8d474cb2b30d5e0e1458dda8f0e |
|
MD5 | a444527dd892d4f497283c4c734ed2e5 |
|
BLAKE2b-256 | edd633882f681222a38e5c39047361c973d41766b3e9575e165548de117069d9 |