Skip to main content

A package for targeted summarization

Project description

TextReducer - A Tool for Summarization and Information Extraction

TextReducer is a tool for summarization and information extraction powered by the SentenceTransformer library. Unlike many techniques for extractive summaries, TextReducer has the option for a "target" around which the summary will be focused. This target can be any text prompt, meaning that a user can specify the type of information that they would like to find or summarize, and ignore everything else.

Another key benefits of TextReducer is that rather than extracting the sentences for the summary, it carves away at the original text, removing unnecessary sentences. This leads to more fluent summarizations, and preserves grammatical features like coreference that are often lost in traditional extractive summarization.

gif

For instance, in the sentences "In his free time, John enjoyed playing golf and traveling with his family. He was married with two children, and lived in a suburban area with his wife and kids.", it is imporant that these sentences stay linked together. Otherwise, the coreferent of the word "He" in the second sentence is lost. TextReducer is much better at preserving such related sentences, and is thus a valuable tool for fast, but fluent summarizations of large texts.

The class has several methods:

  • reduce(huge_text, target, num_sents=5) - this method takes in a large text and a target text, and returns a summary of the input text that is most similar to the target text. It first encodes the input text and the target text using the SentenceTransformer's embedding model, then it calculates the cosine similarity between the embeddings of the input text and the target text. It then finds the top num_sents most similar sentences and returns them as a string.
  • summarize(text, num_sents=5) - this method takes in a text and returns a summary of the input text that is most similar to the overall meaning of the text. It first encodes the input text using the SentenceTransformer's embedding model, then it calculates the cosine similarity between the embeddings of the sentences and the text as a whole. It then finds the top num_sents most similar sentences and returns them as a string.
  • reduce_pdf(pdf_path, target, num_sents=5) - this method takes in a pdf file's path, a target text and a number of sentences to be returned, it then reads the pdf, extract the text from it, calls the summarize method and returns the summary.

Installation

To install TextReducer, use pip install targeted_sum

Applications

  • Summarization - Does a good job summarizing large texts while staying lightweight and maintaining coherence.
  • Information Extraction - Good for finding sections of text related to any given text prompt
  • Question Answering - Good for finding sections of text related to any given question. When paired with a QA model, this can be a very powerful tool.
  • GPT3/ChatGPT Prompting - Good for reducing the length of text prompts for GPT3 or ChatGPT. This saves time and money, and improves responses by removing unnecessary content from prompts.

Demo

A Google Colab demo can be found here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

targeted_sum-1.0.9.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

targeted_sum-1.0.9-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file targeted_sum-1.0.9.tar.gz.

File metadata

  • Download URL: targeted_sum-1.0.9.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for targeted_sum-1.0.9.tar.gz
Algorithm Hash digest
SHA256 7a97683914e9ad3b7a934e83a12b0a2ab08fbbb0006d8873b22634a1a2e581eb
MD5 d180991abbb25dac9dff59d4c3c41100
BLAKE2b-256 46b8c1d9a18e22024c3316285a59c2e6f4f807056fe068dc1c92db1a981f28e0

See more details on using hashes here.

File details

Details for the file targeted_sum-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: targeted_sum-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for targeted_sum-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 0e15840ff8f28a440ad26a8ea1974ec033d0f7507a78db6683b07fd95613e955
MD5 a857bc4b7e7b3694719a5e929af1c81a
BLAKE2b-256 838a74be896ced96404bbe6ff1561d9083eacbfc84fa5ff0238119c10735f6c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page