Skip to main content

CTC: A Unified Framework for Evaluating Natural Language Generation

Project description

CTC Score

This repo contains code of an automatic evaluation metric described in the paper
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
Mingkai Deng*, Bowen Tan* (equal contribution), Zhengzhong Liu, Eric P. Xing, Zhiting Hu
EMNLP 2021

Getting Started

  • Previous work on NLG evaluation has typically focused on a single task and developed individual evaluation metrics based on specific intuitions.
  • In this paper, we propose a unifying perspective based on the nature of information change in NLG tasks, including compression (e.g., summarization), transduction (e.g., text rewriting), and creation (e.g., dialog).
  • A common concept underlying the three broad categories is information alignment, which we define as the extent to which the information in one generation component is grounded in another.
  • We adopt contextualized language models to measure information alignment.

(Note: We provide the user API below. Code to reproduce paper results can be found in the train/ folder.)

Installation

The most easy way to try our new framework is by Colab Open In Colab

If you want to install it on your machine, just follow these steps:

  • Python version >= 3.6

Install with pip from PYPI by

pip install ctc_score

Or install with pip from the repository by

git clone https://github.com/tanyuqian/ctc-gen-eval.git
cd ctc-gen-eval/
pip install -e .

Usage

We provide a command line interface (CLI) of CTC score as well as a python module.

Command Line Interface (CLI)

For the CLI, you can use it as follows:

ctc_score 
    --task style_transfer/summarization/dialog 
    --align the_alignment_model_to_use 
    --aspect the_aspect_to_evaluate 
    --hypo a_file_with_all_hypothesized_texts_to_evaluate (line-by-line) 
    --remove_stopwords add_this_augument_to_remove_stopwords_in_aligning 
    --scores_save_path the_path_to_save_example-wise_scores 
    
    # for task=style_transfer
    --input_sent a_file_with_all_input_sentences (line-by-line)
    
    # for task=summarization
    --doc a_file_with_all_documents (line-by-line) 
    --refs a_file_with_all_references (line-by-line)
    (if each document has more than one reference, divide them by "|||")
    
    # for task=dialog
    --fact a_file_with_all_facts (line-by-line) 
    --dialog_history a_file_with_all_dialog_histories (line-by-line)

Example:

ctc_score --task summarization \
          --align D-cnndm \
          --doc example/docs.txt \
          --refs example/refs.txt \
          --hypo example/hypos.txt \
          --aspect relevance \
          --scores_save_path scores.txt

We provide these information alignment models (options of --align):

  • E-bert: Embedding alignment model with BERT embeddings.
  • E-roberta: Embedding alignment model with RoBERTa embeddings.
  • E-roberta-mnli: Embedding alignment model with RoBERTa-MNLI embeddings.
  • D-topical_chat or R-topical_chat: Discriminative (D) or Regression (R) alignment model trained with TopicalChat dialogs.
  • D-persona_chat or R-persona_chat: Discriminative (D) or Regression (R) alignment model trained with PersonaChat dialogs.
  • D-cnndm or R-cnndm: Discriminative (D) or Regression (R) alignment model trained with CNN/DailyMail documents.
  • D-xsum or R-xsum: Discriminative (D) or Regression (R) alignment model trained with XSUM documents.
  • D-yelp or R-yelp: Discriminative (D) or Regression (R) alignment model trained with Yelp dataset.

More details of these models can be found in our paper.

Python

We provide three scorers: StyleTransferScorer, SummarizationScorer, and DialogScorer. They can be used like this example below (see demo.py for more examples):

from ctc_score import DialogScorer

# Topical-Chat
dialog_history = "so, i'm reading the latest film from studio ghibli is out the tale of princess kaguya. dunno if you're familiar with them, but studio ghibli has made a lot of great animated films, like spirited away, and princess mononoke \n i don't think i have heard of them. i have heard that one of the directors recently passed away, and his last film was nominated for an academy award \n yeah, sadly, disney ( which owns the american rights to the films ) doesn't tend to promote them very much. i think they're worried they 'll cut into their \" home grown \" market. anyway, dunno if you even like animated movies, but they're worth checking out. \n i don't watch them very often. apparently there was a showing of the recent film in a park in d.c. that's one u.s. city i haven't been to \n sadly, i haven't been to dc either, although i've always wanted to visit there. apparently there's a lot of interesting going down this summer. they're having a crab feast at the navy - marine corps stadium. they 'll have 100 gallons of crab soup! can you imagine that much soup? \n\n"
hypo = "i recently met a girl who lives in that area, and she said the nightlife is worth visiting for. it sounds like many of the events feature jazz music. do you listen to jazz very often?"
fact = "from left, emma baker, daniel saperstein and taylor mulitz of flasher will perform this summer's final fort reno concert. ( jared soares for the washington post ) monday, july 30 25th birthday celebration at national postal museum : celebrate 25 years of this institution devoted to the long history of the u.s. postal service with daytime festivities that include cupcakes, birthday postcards, a photo booth and a special scavenger hunt with prizes. 11 a.m. to 2 p.m. free. tuesday, july 31 \" the color purple \" at kennedy center : the tony award - winning musical revival, based on the pulitzer prize - winning alice walker novel of the same name, features jazz, ragtime, gospel and blues with a story about an african american woman named celie surviving poverty in the south during the 1930s. through aug. 26. $ 69-$149. ask a harry potter scholar at southeast neighborhood library : come to this talk from tolanda henderson, a librarian from george washington university, who has used the j.k. rowling book series as a text in academia. commune with other muggles who prove that it's not just kids and young adults who obsess about the boy who lived. 7 p.m. free. wednesday, aug. 1 rico nasty at the fillmore silver spring : two summers ago, rico nasty was a teenage loudmouth from the maryland suburbs, generating buzz on youtube for spitting surly, rainbow - tinted rhymes. now, after signing a deal with atlantic records, the 21-year - old singer is on her way to becoming one of the brightest voices in rap music.\n"

scorer = DialogScorer(align='D-topical_chat')

score = scorer.score(fact=fact, dialog_history=dialog_history, hypo=hypo, aspect='engagingness')
print(score)

Loading Issue (rare case)

If the automatic model loading failed (e.g. shows Unpickling Error) multiple times, it is highly recommended to download the models manually. Although we've updated the downloading part of this model, there are some factors we can't control (e.g. gdrive changes their logic, Internet connection...). Follow these steps are recommended to solve the issue:

  • Go to config.py. You'll find DR_MODEL_LINKS. The first level key (e.g. D-topical_chat) indicates dataset_name and the second level key (e.g. fact_to_response) indicates the model_name
  • Download the models via these links. Rename each models as model_name.ckpt
  • Place each model in the ~/.cache/ctc_score_models/{dataset_name}/ folder. For example, the model fact_to_response.ckpt of topical_chat dataset should be placed in ~/.cache/ctc_score_models/D-topical_chat/ folder
  • Run demo.py to see if the problem is solved.

If you have previously installed the package with pypi, please run pip install ctc-score --upgrade. Doing this will update the ctc version installed on your machine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctc_score-0.1.3.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

ctc_score-0.1.3-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file ctc_score-0.1.3.tar.gz.

File metadata

  • Download URL: ctc_score-0.1.3.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.11

File hashes

Hashes for ctc_score-0.1.3.tar.gz
Algorithm Hash digest
SHA256 3daa078c5e7271005486abf1348f4550c474a163e1e0484b40ea1d8043d5a74a
MD5 936903553dfd192eb93a70c7f1741a41
BLAKE2b-256 2a801e75c4dc3f5b7976c5e96a3613f9854d969e21d666ebcd3f2d4a20309cd5

See more details on using hashes here.

File details

Details for the file ctc_score-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ctc_score-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.11

File hashes

Hashes for ctc_score-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fe4e02a98e506213e0a138a89021aed449b33272ed5f540aaace0e44a935a72f
MD5 05ca53332be88af50837a6e9e753a377
BLAKE2b-256 b8bc11128394c8dd85784a1f530031aae508a9b62e1931967c0d2445317ec3e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page