Factuality Detection for Generative AI
Project description
Factuality Detection in Generative AI
This repository contains the source code and plugin configuration for our paper Factuality Detection in Generative AI: A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios.
Factool is a tool augmented framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT).
Factool now supports 4 tasks:
- knowledge-based QA: Factool detects factual errors in knowledge-based QA.
- code generation: Factool detects execution errors in code generation.
- mathematical reasoning: Factool detects calculation errors in mathematical reasoning.
- scientific literature review: Factool detects hallucinated scientific literatures.
Installation
conda create -n factool_env python=3.11
conda activate factool_env
pip uninstall factool
pip install git+https://github.com/GAIR-NLP/factool
Quick Start
- Install the package.
- Create a file called keys.yaml and put your API keys (openai, serper, scraperapi) under it.
General Usage:
from factool import Factool
# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-3.5-turbo")
inputs = [
{
"prompt": "Introduce Graham Neubig",
"response": "Graham Neubig is a professor at MIT",
"category": "kbqa",
"entry_point": "answer_question",
},
]
response_list = factool_instance.run(inputs)
print(response_list)
'''
knowledge-based QA:
response_list =
[
{
'prompt': prompt_1,
'response': response_1,
'category': 'kbqa',
'claims': [claim_11, claim_12, ..., claims_1n],
'queries': [[query_111, query_112], [query_121, query_122], ..[query_1n1, query_1n2]],
'evidences': [[evidences_11], [evidences_12], ..., [evidences_1n]],
'claim_level_factuality': [{claim_11, reasoning_11, error_11, correction_11, factuality_11}, {claim_12, reasoning_12, error_12, correction_12, factuality_12}, ..., {claim_1n, reasoning_1n, error_1n, correction_1n, factuality_1n}],
'response_level_factuality': factuality_1
},
{
'prompt': prompt_2,
'response': response_2,
'category': 'kbqa',
'claims': [claim_21, claim_22, ..., claims_2n],
'queries': [[query_211, query_212], [query_221, query_222], ..., [query_2n1, query_2n2]],
'evidences': [[evidences_21], [evidences_22], ..., [evidences_2n]],
'claim_level_factuality': [{claim_21, reasoning_21, error_21, correction_21, factuality_21}, {claim_22, reasoning_22, error_22, correction_22, factuality_22}, ..., {claim_2n, reasoning_2n, error_2n, correction_2n, factuality_2n}],
'response_level_factuality': factuality_2,
},
...
]
code generation:
response_list =
[
{
'prompt': prompt_1,
'response': response_1,
'claim': claim_1,
'category': 'code',
'testcases_queries': [testcase_query_11, testcase_query_12, testcase_query_13],
'potential_solutions_queries': [potential_solution_query_11, potential_solution_query_12, potential_solution_query_13],
'exec_results': [[evidences_111, evidences_112, evidences_113, evidences_114], [evidences_121, evidences_122, evidences_123, evidences_124], [evidences_131, evidences_132, evidences_133, evidences_134]],
'claim_level_factuality': factuality_1,
'response_level_factuality': factuality_1,
},
{
'prompt': prompt_2,
'response': response_2,
'claim': claim_2,
'category': 'code',
'testcases_queries': [testcase_query_21, testcase_query_22, testcase_query_23],
'potential_solutions_queries': [potential_solution_query_21, potential_solution_query_22, potential_solution_query_23],
'exec_results': [[evidences_211, evidences_212, evidences_213, evidences_214], [evidences_221, evidences_222, evidences_223, evidences_224], [evidences_231, evidences_232, evidences_233, evidences_234]],
'claim_level_factuality': factuality_2,
'response_level_factuality': factuality_2,
},
...
]
mathematical problem solving:
response_list =
[
{
'prompt': prompt_1,
'response': response_1,
'category': 'math',
'claims': [claim_11, claim_12, ..., claims_1n],
'queries': [query_11, query_12, ..., query_1n],
'execution_results': [exec_result_11, exec_result_12, ..., exec_result_1n],
'claim_level_factuality': [factuality_11, factuality_12, ..., factuality_1n],
'response_level_factuality': factuality_1
},
{
'prompt': prompt_2,
'response': response_2,
'category': 'math',
'claims': [claim_21, claim_22, ..., claims_2n],
'queries': [query_21, query_22, ..., query_2n],
'execution_results': [exec_result_21, exec_result_22, ..., exec_result_2n],
'claim_level_factuality': [factuality_21, factuality_22, ..., factuality_2n],
'response_level_factuality': factuality_2
},
...
]
scientific literature review:
response_list =
[
{
'prompt': prompt_1,
'response': response_1,
'category': 'scientific',
'claims': [claim_11, claim_12, ..., claims_1n],
'queries': [query_11, query_12, ..., query_1n],
'evidences': [evidences_11, evidences_12, ..., evidences_1n],
'claim_level_factuality': [{claim_11, evidence_11, error_11, factuality_11}, {claim_12, evidence_12, error_12, factuality_12}, ..., {claim_1n, evidence_1n, error_1n, factuality_1n}],
'response_level_factuality': factuality_1
},
{
'prompt': prompt_2,
'response': response_2,
'category': 'scientific',
'claims': [claim_21, claim_22, ..., claims_2n],
'queries': [query_21, query_22, ..., query_2n],
'evidences': [evidences_21, evidences_22, ..., evidences_2n],
'claim_level_factuality': [{claim_21, evidence_21, error_21, factuality_21}, {claim_22, evidence_22, error_22, factuality_22}, ..., {claim_2n, evidence_2n, error_2n, factuality_2n}],
'response_level_factuality': factuality_2
},
...
]
Steps for setting up FACTOOL ChatGPT Plugin:
- Install the package: Installation
- git clone the repo:
git clone https://github.com/GAIR-NLP/factool.git
cd ./factool/plugin_config
- Create your
keys.yaml
- Run the API locally:
uvicorn main:app --host 0.0.0.0 --port ${PORT:-5003}
- Enter plugin store of ChatGPT Website
- Click 'develop your own plugin' then enter the website domain
localhost: 5003
under 'domain'.
Experiments:
- Experimental results:
-
Exp I:
.results/knowledge_QA/RoSE/
-
Exp II:
.results/knowledge_QA
,.results/code
,.results/math
,.results/scientific
-
Exp III
.results/chat
- Get the final results and statistics for fine-grained analysis:
-
Exp I:
python ./results/knowledge_QA/RoSE/run_rose_claim_extraction.py
-
Exp II:
bash ./results/evaluation.sh
-
Exp III:
python ./results/chat/calc_stats.py
- Reimplement the experiments:
-
Exp II:
bash run_experiments.sh
-
Exp III:
bash run_chatbot.sh
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file factool-0.0.1.tar.gz
.
File metadata
- Download URL: factool-0.0.1.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
988b962242eb8a3d435973d8d215bbb4f94b5b9befc906c2a143b43119181cde
|
|
MD5 |
75db757f66b79cea8c85039d6712a3cf
|
|
BLAKE2b-256 |
2b2fdec71ba016fd7974ee279e7da58e80af6e391097ca122c23e572ad681f2b
|
File details
Details for the file factool-0.0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: factool-0.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 39.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
2b9d48d8d53b4bad42020d6609956f72a48bd8d474cb2b30d5e0e1458dda8f0e
|
|
MD5 |
a444527dd892d4f497283c4c734ed2e5
|
|
BLAKE2b-256 |
edd633882f681222a38e5c39047361c973d41766b3e9575e165548de117069d9
|