Skip to main content

A BERT-based DBSCAN summarizer package

Project description

Bert-based Information Extractive Summarizer

How it works

It first uses BERT-pretrained model to do the embedding for the sentences, then running a clustering algorithm (DBSCAN).

Previous work

Derek Miller's has proposed a paper using BERT and some clustering algorithm (KMean) in the paper (https://arxiv.org/pdf/1906.04165.pdf). But there are some limitations in his work. Our approach has overcomed some weakness. This tool utilizes the HuggingFace Pytorch transformers library to run extractive summarizations. This library also uses coreference techniques, utilizing the (https://github.com/huggingface/neuralcoref) neuralcoref library to resolve words in summaries that need more context. The greedyness of the neuralcoref library can be tweaked in the CoreferenceHandler class. Paper (https://arxiv.org/abs/1906.04165)

Install packages

pip install pandas
pip install spacy
pip install transformers
pip install neuralcoref
pip install pandas
pip install gensim
pip install scipy
pip install sklearn
pip install matplotlib
pip install torch
pip install matplotlib
pip install seaborn
python -m spacy download en_core_web_md

Large Example using CNN

from summarizer import Summarizer

body = '''
(CNN) -- Can a movie actually convince you to support torture? Can a movie really persuade you that "fracking" -- a process used to drill for natural gas -- is a danger to the environment? Can a movie truly cause you to view certain minority groups in a negative light?

Some scoff at the notion that movies do anything more than entertain. They are wrong. Sure, it's unlikely that one movie alone will change your views on issues of magnitude. But a movie (or TV show) can begin your "education" or "miseducation" on a topic. And for those already agreeing with the film's thesis, it can further entrench your views.

Anyone who doubts the potential influence that movies can have on public opinion need to look no further than two films that are causing an uproar even before they have opened nationwide. They present hot button issues that manage to fire up people from the left and right.

The first, "Zero Dark Thirty," is about the pursuit and killing of Osama bin Laden, which features scenes of torture. The second, "Promised Land," stars Matt Damon and explores how the use of fracking to drill for natural gas can pose health and environmental dangers.

Critics of "Zero Dark Thirty" fear that audiences will accept as true the film's story line that torture was effective in eliciting information to locate bin Laden. They are rightfully concerned that the film will sway some to become more receptive or even supportive of the idea of torturing prisoners.

Peter Bergen: Did torture really net bin Laden?

Opposition to the film escalated last week as three senior U.S. senators -- John McCain, Carl Levin and Dianne Feinstein -- sent a letter to the film's distributor, Sony Pictures, characterizing the film's use of torture as "grossly inaccurate and misleading." The senators bluntly informed Sony Pictures that it has "an obligation to state that the role of torture in the hunt for Osama bin Laden is not based on the facts, but rather part of the film's fictional narrative."

The hostility toward "Promised Land" shows us that it's not just politicians who complain about movie messages. Big business -- namely, the gas industry -- is aggressively objecting to the allegation in "Promised Land" that fracking poses environmental and health risks.

How concerned is the gas industry?

It has set up a rapid response team to counter publicity for the film by using two Washington-based groups that lobby for gas and oil companies: the Independent Petroleum Association of America and Energy in Depth. These groups have scrutinized appearances by the films stars on talk shows, questioned who the financiers of the film are, published parts of the script and mocked the film on social media.

Energy in Depth went as far as to "fact check" a recent appearance by the film's co-star and co-writer, John Krasinski, on "Late Night With David Letterman." Within hours of Krasinski's appearance, Energy in Depth posted a blog on its website pointing out what it perceived as factual errors made by Krasinski about fracking.

Regardless of whether "Zero Dark Thirty" and "Promised Land" intended to promote any message, people who watch them will be "educated" in some way on torture and fracking -- even if very subtly.

This is the same reason that minority groups continue to object to being represented in a negative light in movies and TV. They understand that accurate representations matter because studies have shown that biases can form based on stereotypes or inaccurate representations. (Being of Italian and Arab descent, I'm acutely aware of this issue as my respective heritages have been represented by a parade of mobsters and terrorists.)

What's Hollywood's role in all of this? The same as it has always been -- to make money.

In fact, there's no doubt that the studios behind these movies are overjoyed at the controversy that has erupted and the resulting free press. Indeed, the response of Sony Pictures to the uproar over "Zero Dark Thirty" tells you about what they really hope we will all do: "We encourage people to see the film before characterizing it."

So go ahead, enjoy these films and ones like them that are based on actual events or current hot issues. But while you are watching them, be aware you might be getting more than the price of ticket. You might also be getting a (mis)education.

The opinions expressed in this commentary are solely those of Dean Obeidallah.

'''

model = Summarizer()
result = model(body, min_length=65)
outputStr = ''.join(result)
print(outputStr)
  • ratio: it is a parameter used to set the min points for identify the core point. (default to 0.2)
  • min_length: The minimum length to accept as a sentence. (default to 25)
  • max_length: The maximum length to accept as a sentence. (default to 500)

Future work

The identification of suitable epsilon can be automated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

summarizer-yshi0914-0.0.1.tar.gz (11.9 kB view hashes)

Uploaded Source

Built Distribution

summarizer_yshi0914-0.0.1-py3-none-any.whl (14.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page