Skip to main content

No project description provided

Project description

SummaC: Summary Consistency Detection

This repository contains the code for TACL2021 paper: SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

We release: (1) the trained SummaC models, (2) the SummaC Benchmark and data loaders, (3) training and evaluation scripts.

Trained SummaC Models

The two trained models SummaC-ZS and SummaC-Conv are implemented in model_summac.py (link):

  • SummaC-ZS does not require a model file (as the model is zero-shot and not trained): it can be used as seen at the bottom of the model_summac.py.
  • SummaC-Conv requires a start_file which contains the trained weight for the convolution layer. The default start_file used to compute results is available in this repository ( summac_conv_vitc_sent_perc_e.bin download link).

Example use

from model_summac import SummaCZS

model = SummaCZS(granularity="sentence", model_name="vitc")

document = """Scientists are studying Mars to learn about the Red Planet and find landing sites for future missions.
One possible site, known as Arcadia Planitia, is covered instrange sinuous features.
The shapes could be signs that the area is actually made of glaciers, which are large masses of slow-moving ice.
Arcadia Planitia is in Mars' northern lowlands."""

summary1 = "There are strange shape patterns on Arcadia Planitia. The shapes could indicate the area might be made of glaciers. This makes Arcadia Planitia ideal for future missions."
summary2 = "There are strange shape patterns on Arcadia Planitia. The shapes could indicate the area might be made of glaciers."

score1 = model.score([document], [summary1])
print("Summary Score 1 consistency: %.3f" % (score1["scores"][0])) # Prints: 0.587

score2 = model.score([document], [summary2])
print("Summary Score 2 consistency: %.3f" % (score2["scores"][0])) # Prints: 0.877

To load all the necessary files: (1) clone this repository, (2) add the reposity to Python path: export PYTHONPATH="${PYTHONPATH}:/path/to/summac/"

SummaC Benchmark

The SummaC Benchmark consists of 6 summary consistency datasets that have been standardized to a binary classification task. The datasets included are:


% Positive is the percentage of positive (consistent) summaries. IAA is the inter-annotator agreement (Fleiss Kappa). Source is the dataset used for the source documents (CNN/DM or XSum). # Summarizers is the number of summarizers (extractive and abstractive) included in the dataset. # Sublabel is the number of labels in the typology used to label summary errors.

The data-loaders for the benchmark are included in utils_summac_benchmark.py (link). Because the dataset relies on previously published work, the dataset requires the manual download of several datasets. For each of the 6 tasks, the link and instruction to download are present as a comment in the file. Once all the files have been compiled, the benchmark can be loaded and standardized by running:

from utils_summac_benchmark import SummaCBenchmark
benchmark_validation = SummaCBenchmark(benchmark_folder="/path/to/summac_benchmark/", cut="val")

Note: we have a plan to streamline the process by further improving to automatically download necessary files if not present, if you would like to participate please let us know. If encoutering an issue in the manual download process, please contact us.

Cite the work

If you make use of the code, models, or algorithm, please cite our paper. Bibtex to come.

Contributing

If you'd like to contribute, or have questions or suggestions, you can contact us at phillab@berkeley.edu. All contributions welcome, for example helping make the benchmark more easily downloadable, or improving model performance on the benchmark.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

summac-0.0.1.tar.gz (28.3 kB view details)

Uploaded Source

Built Distributions

summac-0.0.1-py3.10.egg (2.8 kB view details)

Uploaded Source

summac-0.0.1-py3.8.egg (54.2 kB view details)

Uploaded Source

summac-0.0.1-py3-none-any.whl (30.4 kB view details)

Uploaded Python 3

File details

Details for the file summac-0.0.1.tar.gz.

File metadata

  • Download URL: summac-0.0.1.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for summac-0.0.1.tar.gz
Algorithm Hash digest
SHA256 6d6d2b0ac46277eda75c098794110c2fb67369502375a42910b1a859ae08e40f
MD5 201a764c8b454ed90f68cab0b9433ecc
BLAKE2b-256 2bab0f77f48ef44cd78ea4148ce143a2a742192eca7fa6f750163d04343eb90a

See more details on using hashes here.

File details

Details for the file summac-0.0.1-py3.10.egg.

File metadata

  • Download URL: summac-0.0.1-py3.10.egg
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for summac-0.0.1-py3.10.egg
Algorithm Hash digest
SHA256 8b664ed461dc0480fc32ee4e0253d03cd2373df30e065b464d571059f851c7ac
MD5 1f72c697db4a9a86a0d087928f512d73
BLAKE2b-256 b6720877444f2d0babc1dded073a01490af11a8381c9a3183bede9175b4ae8db

See more details on using hashes here.

File details

Details for the file summac-0.0.1-py3.8.egg.

File metadata

  • Download URL: summac-0.0.1-py3.8.egg
  • Upload date:
  • Size: 54.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for summac-0.0.1-py3.8.egg
Algorithm Hash digest
SHA256 20e73699c968dcfb64db80bf22e1d7fb17d10ddf4da50554f0397af9e5dcd578
MD5 a2271b8c57d750924a168ffbb47db16a
BLAKE2b-256 a22ca716ad17cf3da3c65aacad49912198ff807fc6972316c22f9ddd555d0fac

See more details on using hashes here.

File details

Details for the file summac-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: summac-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 30.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for summac-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 39aa8b1e4fc63f643fc28facb6628336b6d89b1868b15affe36483c46c7b63f9
MD5 676a2a65bcc3962f66ff95953d8adccf
BLAKE2b-256 eab4c2052d4f23749158c084617fa03a00b3694ea68b15f5a1354089b50d7878

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page