Skip to main content

FactSumm: Factual Consistency Scorer for Abstractive Summarization

Project description

FactSumm: Factual Consistency Scorer for Abstractive Summarization

FactSumm is a toolkit that scores Factualy Consistency for Abstract Summarization

Without fine-tuning, you can simply apply a variety of downstream tasks to both the source article and the generated abstractive summary

For example, by extracting fact triples from source articles and generated summaries, we can verify that generated summaries correctly reflect source-based facts ( See image above )

As you can guess, this PoC-ish project uses a lot of pre-trained modules that require super-duper computing resources

So don't blame me, just take it as a concept project 👀


Installation

FactSumm requires Java to be installed in your environment to use Stanford OpenIE. With Java and Python 3, you can install FactSumm from source repository:

git clone https://github.com/huffon/factsumm
cd factsumm
pip install .

Usage

>>> from factsumm import FactSumm
>>> factsumm = FactSumm()
>>> article = "Lionel Andrés Messi (born 24 June 1987) is an Argentine professional footballer who plays as a forward and captains both Spanish club Barcelona and the Argentina national team. Often considered as the best player in the world and widely regarded as one of the greatest players of all time, Messi has won a record six Ballon d'Or awards, a record six European Golden Shoes, and in 2020 was named to the Ballon d'Or Dream Team."
>>> summary = "Lionel Andrés Messi (born 24 Aug 1997) is an Spanish professional footballer who plays as a forward and captains both Spanish club Barcelona and the Spanish national team."
>>> factsumm(article, summary, verbose=True)
SOURCE Entities
1: [('Lionel Andrés Messi', 'PERSON'), ('24 June 1987', 'DATE'), ('Argentine', 'NORP'), ('Spanish', 'NORP'), ('Barcelona',
'GPE'), ('Argentina', 'GPE')]
2: [('one', 'CARDINAL'), ('Messi', 'PERSON'), ('six', 'CARDINAL'), ('European Golden Shoes', 'WORK_OF_ART'), ('2020', 'DATE'),
("the Ballon d'Or Dream Team", 'ORG')]

SUMMARY Entities
1: [('Lionel Andrés Messi', 'PERSON'), ('24 Aug 1997', 'DATE'), ('Spanish', 'NORP'), ('Barcelona', 'ORG')]

SOURCE Facts
('Lionel Andrés Messi', 'per:origin', 'Argentine')
('Spanish', 'per:date_of_birth', '24 June 1987')
('Spanish', 'org:top_members/employees', 'Lionel Andrés Messi')
('Spanish', 'org:members', 'Barcelona')
('Lionel Andrés Messi', 'per:employee_of', 'Barcelona')
('Lionel Andrés Messi', 'per:date_of_birth', '24 June 1987')
('Barcelona', 'org:top_members/employees', 'Lionel Andrés Messi')

SUMMARY Facts
('Lionel Andrés Messi', 'per:origin', 'Spanish')
('Lionel Andrés Messi', 'per:date_of_birth', '24 Aug 1997')
('Spanish', 'per:date_of_birth', '24 Aug 1997')
('Spanish', 'org:top_members/employees', 'Lionel Andrés Messi')
('Spanish', 'org:members', 'Barcelona')
('Lionel Andrés Messi', 'per:employee_of', 'Barcelona')
('Barcelona', 'org:top_members/employees', 'Lionel Andrés Messi')

COMMON Facts
('Spanish', 'org:top_members/employees', 'Lionel Andrés Messi')
('Spanish', 'org:members', 'Barcelona')
('Lionel Andrés Messi', 'per:employee_of', 'Barcelona')
('Barcelona', 'org:top_members/employees', 'Lionel Andrés Messi')

DIFF Facts
('Lionel Andrés Messi', 'per:origin', 'Spanish')
('Lionel Andrés Messi', 'per:date_of_birth', '24 Aug 1997')
('Spanish', 'per:date_of_birth', '24 Aug 1997')

Fact Score: 0.5714285714285714

Answers based on SOURCE (Questions are generated from Summary)
[Q] Who is the captain of the Spanish national team?    [Pred] <unanswerable>
[Q] When was Lionel Andrés Messi born?  [Pred] 24 June 1987
[Q] Lionel Andrés Messi is a professional footballer of what nationality?       [Pred] Argentine
[Q] Lionel Messi is a captain of which Spanish club?    [Pred] Barcelona

Answers based on SUMMARY (Questions are generated from Summary)
[Q] Who is the captain of the Spanish national team?    [Pred] Lionel Andrés Messi
[Q] When was Lionel Andrés Messi born?  [Pred] 24 Aug 1997
[Q] Lionel Andrés Messi is a professional footballer of what nationality?       [Pred] Spanish
[Q] Lionel Messi is a captain of which Spanish club?    [Pred] Barcelona

QAGS Score: 0.3333333333333333

SOURCE Triples
('Messi', 'is', 'Argentine')
('Messi', 'is', 'professional')

SUMMARY Triples
('Messi', 'is', 'Spanish')
('Messi', 'is', 'professional')

Triple Score: 0.5

Avg. ROUGE-1: 0.4415584415584415
Avg. ROUGE-2: 0.3287671232876712
Avg. ROUGE-L: 0.4415584415584415

Sub-modules

From here, you can find various way to score Factual Consistency level with Unsupervised methods


Triple-based Module ( closed-scheme )

>>> from factsumm import FactSumm
>>> factsumm = FactSumm()
>>> factsumm.extract_facts(article, summary, verbose=True)
SOURCE Entities
1: [('Lionel Andrés Messi', 'PERSON'), ('24 June 1987', 'DATE'), ('Argentine', 'NORP'), ('Spanish', 'NORP'), ('Barcelona',
'GPE'), ('Argentina', 'GPE')]
2: [('one', 'CARDINAL'), ('Messi', 'PERSON'), ('six', 'CARDINAL'), ('European Golden Shoes', 'WORK_OF_ART'), ('2020', 'DATE'),
("the Ballon d'Or Dream Team", 'ORG')]

SUMMARY Entities
1: [('Lionel Andrés Messi', 'PERSON'), ('24 Aug 1997', 'DATE'), ('Spanish', 'NORP'), ('Barcelona', 'ORG')]

SOURCE Facts
('Lionel Andrés Messi', 'per:origin', 'Argentine')
('Spanish', 'per:date_of_birth', '24 June 1987')
('Spanish', 'org:top_members/employees', 'Lionel Andrés Messi')
('Spanish', 'org:members', 'Barcelona')
('Lionel Andrés Messi', 'per:employee_of', 'Barcelona')
('Lionel Andrés Messi', 'per:date_of_birth', '24 June 1987')
('Barcelona', 'org:top_members/employees', 'Lionel Andrés Messi')

SUMMARY Facts
('Lionel Andrés Messi', 'per:origin', 'Spanish')
('Lionel Andrés Messi', 'per:date_of_birth', '24 Aug 1997')
('Spanish', 'per:date_of_birth', '24 Aug 1997')
('Spanish', 'org:top_members/employees', 'Lionel Andrés Messi')
('Spanish', 'org:members', 'Barcelona')
('Lionel Andrés Messi', 'per:employee_of', 'Barcelona')
('Barcelona', 'org:top_members/employees', 'Lionel Andrés Messi')

COMMON Facts
('Spanish', 'org:top_members/employees', 'Lionel Andrés Messi')
('Spanish', 'org:members', 'Barcelona')
('Lionel Andrés Messi', 'per:employee_of', 'Barcelona')
('Barcelona', 'org:top_members/employees', 'Lionel Andrés Messi')

DIFF Facts
('Lionel Andrés Messi', 'per:origin', 'Spanish')
('Lionel Andrés Messi', 'per:date_of_birth', '24 Aug 1997')
('Spanish', 'per:date_of_birth', '24 Aug 1997')

Fact Score: 0.5714285714285714

The triple-based module counts the overlap of fact triples between the generated summary and the source document.


QA-based Module

If you ask questions about the summary and the source document, you will get a similar answer if the summary realistically matches the source document

>>> from factsumm import FactSumm
>>> factsumm = FactSumm()
>>> factsumm.extract_qas(article, summary, verbose=True)
Answers based on SOURCE (Questions are generated from Summary)
[Q] Who is the captain of the Spanish national team?    [Pred] <unanswerable>
[Q] When was Lionel Andrés Messi born?  [Pred] 24 June 1987
[Q] Lionel Andrés Messi is a professional footballer of what nationality?       [Pred] Argentine
[Q] Lionel Messi is a captain of which Spanish club?    [Pred] Barcelona

Answers based on SUMMARY (Questions are generated from Summary)
[Q] Who is the captain of the Spanish national team?    [Pred] Lionel Andrés Messi
[Q] When was Lionel Andrés Messi born?  [Pred] 24 Aug 1997
[Q] Lionel Andrés Messi is a professional footballer of what nationality?       [Pred] Spanish
[Q] Lionel Messi is a captain of which Spanish club?    [Pred] Barcelona

QAGS Score: 0.3333333333333333

OpenIE-based Module ( open-scheme )

>>> from factsumm import FactSumm
>>> factsumm = FactSumm()
>>> factsumm.extract_triples(article, summary, verbose=True)
SOURCE Triples
('Messi', 'is', 'Argentine')
('Messi', 'is', 'professional')

SUMMARY Triples
('Messi', 'is', 'Spanish')
('Messi', 'is', 'professional')

Triple Score: 0.5

Stanford OpenIE can extract relationships from raw strings. But it's important to note that it's based on the open scheme, not the closed scheme (like Triple-based Module).

For example, from "Obama was born in Hawaii", OpenIE extracts (Obama, born in Hawaii). However, from "Hawaii is the birthplace of Obama", it extracts (Hawaii, is the birthplace of, Obama). In common sense, the triples extracted from the two sentences should be identical, but OpenIE can't recognize that they are the same since it is based on an open scheme.

So the score for this module may be unstable.


ROUGE-based Module

>>> from factsumm import FactSumm
>>> factsumm = FactSumm()
>>> factsumm.calculate_rouge(article, summary)
Avg. ROUGE-1: 0.4415584415584415
Avg. ROUGE-2: 0.3287671232876712
Avg. ROUGE-L: 0.4415584415584415

Simple but effective word-level overlap ROUGE score


Citation

If you apply this library to any project, please cite:

@misc{factsumm,
  author       = {Heo, Hoon},
  title        = {FactSumm: Factual Consistency Scorer for Abstractive Summarization},
  howpublished = {\url{https://github.com/Huffon/factsumm}},
  year         = {2021},
}

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factsumm-0.0.1.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

factsumm-0.0.1-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file factsumm-0.0.1.tar.gz.

File metadata

  • Download URL: factsumm-0.0.1.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9

File hashes

Hashes for factsumm-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d28ce6aec3c191ec6f9addc9f6e47381f9d95e884ba844640f35453cbbff4143
MD5 89ac148dc20c251057837d189dd691a8
BLAKE2b-256 db04011d183c845e0305a06eba8444525934e918c61ee13003874e2b13dedb3b

See more details on using hashes here.

File details

Details for the file factsumm-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: factsumm-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9

File hashes

Hashes for factsumm-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9de1771573ee6371e1076fcbf7b8ab921b9322702f2b5b4f5442e7341dd4ceef
MD5 872bedd3dd9b31a77670355541761adb
BLAKE2b-256 b107a6c334c2f65390b8e09e23efb5e3c5c1de0e3aa7ea64eb677aa5e189d69a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page