Skip to main content

Identifies Claims from Text

Project description


Prompt Tuning for Claim Summarization

This Python package provides a tool to generate short summaries for content based on a set of zero-shot examples. The approach leverages prompt tuning to generate summaries of claims from input content using pre-existing models like Gemini, without needing a fine-tuning phase.

Approach

We utilize prompt tuning as the primary method for this project. Instead of fine-tuning the language model (which requires large datasets), we provide prompts by showing the model several examples of "Content" followed by the expected "Summary of Claims". This method is particularly useful when working with small datasets.

  1. Data: The input data consists of "Content" (such as a conversation between a user and an agent) and "Reasons" (which are comma-separated summaries of the issues or claims).

  2. Zero-shot learning: For each query, we randomly select a subset of examples (e.g., 7) from the dataset to use as reference examples. The prompt generator constructs a natural language prompt from these examples, asking the model to summarize the new content.

  3. Evaluation: We evaluate the model's performance using:

    • ROUGE Scores: Measures the overlap between the generated summary and the actual summary (Reason).
    • Cosine Similarity: Measures the similarity between the TF-IDF vectors of the generated summary and the actual reason.

How to Use

Installation

  1. Clone the repository or download the .zip file.
  2. Ensure you have the required dependencies installed. You can install them using the following:
    pip3 install setuptools 
    

Usage

This package exposes two main functions to the user:

1. Performance Evaluation (perfomance_on_data)

This function evaluates the model's performance across the entire dataset by generating summaries and calculating ROUGE and Cosine Similarity metrics.

from your_package_name import perfomance_on_data

# Evaluate the performance on the dataset
perfomance_on_data()

Output:

  • The function will print out the generated summaries, the actual summaries, ROUGE scores, and Cosine Similarity scores for each example.
  • It will also print the average ROUGE and Cosine Similarity scores across all examples.

2. Generate Summary on Query (genrate_on_query)

This function allows the user to input a query (i.e., new content) and receive a generated summary of claims based on the trained model.

from your_package_name import genrate_on_query

# Generate summary for a user-provided query
genrate_on_query()

Dependencies

  • transformers: For utilizing pre-trained language models.
  • sklearn: For cosine similarity and vectorization.
  • rouge: For calculating ROUGE scores.

This package offers a lightweight and flexible way to generate summaries using zero-shot learning and can be integrated into any workflow requiring natural language summarization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

IdentificationOfClaims-0.1.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

IdentificationOfClaims-0.1.1-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file IdentificationOfClaims-0.1.1.tar.gz.

File metadata

  • Download URL: IdentificationOfClaims-0.1.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for IdentificationOfClaims-0.1.1.tar.gz
Algorithm Hash digest
SHA256 db7549dc4dd52cbc5bd573a5345a6ec756bc0995928b1fe435c4618af4a77948
MD5 48edcd9960c3dfe833f28c12df9c5028
BLAKE2b-256 88d0034d80de6865f32204baba88750ef56585bc3e28c0ad5868ce95636c993a

See more details on using hashes here.

File details

Details for the file IdentificationOfClaims-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for IdentificationOfClaims-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d40ac089f2da7bd47f84e04beb0937b9abb07d7fae5617c6f47b833fb5306329
MD5 cc29a6f24798dde741456e114e938c21
BLAKE2b-256 93251b461386196c2001714a0e61faab11808f0fc16d06cb9afae0de78b1b9f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page