Skip to main content

FactScore is an automatic evaluation metric for factual precision in long-form text generation. It uses large language models and retrieval to break down generations into atomic facts and then measure the correctness with respect to a knowledge source (like Wikipedia).

Project description

FActScore

made-with-python PyPI version factscore Downloads

This is the official release accompanying our preprint, "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation". FActScore is available as a PIP package as well.

Install

python3.7 -m virtualenv fs-venv
pip install factscore
python -m spacy download en_core_web_lg

Download the data

python -m factscore.download_data

Or, download it manually from this Google Drive link. Make a cache directory .cache/factscore, and place unzipped demos and enwiki-20230401.db in that directory.

Running the script with oracle atomic facts

python -m factscore.factscorer --data_path {data_path} --model_name {estimator_name} --cache_dir {cache_dir} --openai_key {openai_key}
  • data_path can be something like data/src-light/bio_ChatGPT_v0.2.jsonl which is in a format we have been using so far. TODO for simplying the format and allowing it to take any topics/generations.
  • model_name: retrieval+llama, retrieval+llama+npm, retrieval+ChatGPT, retrieval+ChatGPT+npm
  • cache_dir: .cache/factscore by default.
  • openai_key: File containing API Key, only needed when ChatGPT is being used.

For example,

python -m factscore.factscorer \
    --data_path original_generation/v0.2/answers_mpt-7b_bio_test_addtional.jsonl \
    --model_name "retrieval+ChatGPT" \
    --cache_dir ".cache/factscore" \
    --openai_key "api.key"

It uses enwiki-20230401 by default, and will download the database from our Google drive. It also uses Inst-LLAMA, downloading from the Google Drive. TODO: need to release diff from LLAMA 7B only. Also need to allow users to specify their own LM path if they want to use a different LM.

To use a custom knowledge source.

You need a .jsonl file where each line is a dictionary containing title and text. text can either be a string or a list of strings (e.g., sections).

from factscore.factscorer import FactScorer

fs = FactScorer()

# this will create a database using your file
# for English Wikipedia (18GB)), it takes ~8 hours
# once DB file is created, you can reuse it by only specifying `db_path`
fs.register_knowledge_source(name_of_your_knowledge_source,
                             data_path=path_to_jsonl_file,
                             db_path=path_to_output_db_file)

# now, when you compute a score, specify knowledge source to use
score = fs.get_score(topics, generations, knowledge_source=name_of_your_knowledge_source)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factscore-0.1.0.tar.gz (18.9 kB view hashes)

Uploaded Source

Built Distribution

factscore-0.1.0-py3-none-any.whl (21.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page