Skip to main content

Identifies Claims from Text

Project description


Prompt Tuning for Claim Summarization

This Python package provides a tool to generate short summaries for content based on a set of zero-shot examples. The approach leverages prompt tuning to generate summaries of claims from input content using pre-existing models like Gemini, without needing a fine-tuning phase.

Approach

We utilize prompt tuning as the primary method for this project. Instead of fine-tuning the language model (which requires large datasets), we provide prompts by showing the model several examples of "Content" followed by the expected "Summary of Claims". This method is particularly useful when working with small datasets.

  1. Data: The input data consists of "Content" (such as a conversation between a user and an agent) and "Reasons" (which are comma-separated summaries of the issues or claims).

  2. Zero-shot learning: For each query, we randomly select a subset of examples (e.g., 7) from the dataset to use as reference examples. The prompt generator constructs a natural language prompt from these examples, asking the model to summarize the new content.

  3. Evaluation: We evaluate the model's performance using:

    • ROUGE Scores: Measures the overlap between the generated summary and the actual summary (Reason).
    • Cosine Similarity: Measures the similarity between the TF-IDF vectors of the generated summary and the actual reason.

How to Use

Installation

  1. Clone the repository or download the .zip file.
  2. Ensure you have the required dependencies installed. You can install them using the following:
    pip3 install setuptools 
    

Usage

This package exposes two main functions to the user:

1. Performance Evaluation (perfomance_on_data)

This function evaluates the model's performance across the entire dataset by generating summaries and calculating ROUGE and Cosine Similarity metrics.

from your_package_name import perfomance_on_data

# Evaluate the performance on the dataset
perfomance_on_data()

Output:

  • The function will print out the generated summaries, the actual summaries, ROUGE scores, and Cosine Similarity scores for each example.
  • It will also print the average ROUGE and Cosine Similarity scores across all examples.

2. Generate Summary on Query (genrate_on_query)

This function allows the user to input a query (i.e., new content) and receive a generated summary of claims based on the trained model.

from your_package_name import genrate_on_query

# Generate summary for a user-provided query
genrate_on_query()

Dependencies

  • transformers: For utilizing pre-trained language models.
  • sklearn: For cosine similarity and vectorization.
  • rouge: For calculating ROUGE scores.

This package offers a lightweight and flexible way to generate summaries using zero-shot learning and can be integrated into any workflow requiring natural language summarization.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claim_identification_prompt_tuning-0.1.1.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file claim_identification_prompt_tuning-0.1.1.tar.gz.

File metadata

File hashes

Hashes for claim_identification_prompt_tuning-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7131dd3ae76066ca3805bf6cd6f0ebb434eb1d80caf8114d4bd33a72b231f5e7
MD5 e3cb6b6b5d779af32ca532569861de01
BLAKE2b-256 9c78dee6de5f22757262939fc2220bb8bd006bebf7894592e1b7c02e6070b956

See more details on using hashes here.

File details

Details for the file claim_identification_prompt_tuning-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for claim_identification_prompt_tuning-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 544ad8a82e1af373f3d2bd9d95caa819b2241c696f31bce12c19926634c3414f
MD5 809848bbf38d3b26a09a340791f8671f
BLAKE2b-256 db6a95feb74dd5bd93b54016c9b3c749cda62ae16fc520c0e3282e2cdc5f5535

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page