Identifies Claims from Text
Project description
Prompt Tuning for Claim Summarization
This Python package provides a tool to generate short summaries for content based on a set of zero-shot examples. The approach leverages prompt tuning to generate summaries of claims from input content using pre-existing models like Gemini, without needing a fine-tuning phase.
Approach
We utilize prompt tuning as the primary method for this project. Instead of fine-tuning the language model (which requires large datasets), we provide prompts by showing the model several examples of "Content" followed by the expected "Summary of Claims". This method is particularly useful when working with small datasets.
-
Data: The input data consists of "Content" (such as a conversation between a user and an agent) and "Reasons" (which are comma-separated summaries of the issues or claims).
-
Zero-shot learning: For each query, we randomly select a subset of examples (e.g., 7) from the dataset to use as reference examples. The prompt generator constructs a natural language prompt from these examples, asking the model to summarize the new content.
-
Evaluation: We evaluate the model's performance using:
- ROUGE Scores: Measures the overlap between the generated summary and the actual summary (Reason).
- Cosine Similarity: Measures the similarity between the TF-IDF vectors of the generated summary and the actual reason.
How to Use
Installation
- Clone the repository or download the
.zipfile. - Ensure you have the required dependencies installed. You can install them using the following:
pip3 install setuptools
Usage
This package exposes two main functions to the user:
1. Performance Evaluation (perfomance_on_data)
This function evaluates the model's performance across the entire dataset by generating summaries and calculating ROUGE and Cosine Similarity metrics.
from your_package_name import perfomance_on_data
# Evaluate the performance on the dataset
perfomance_on_data()
Output:
- The function will print out the generated summaries, the actual summaries, ROUGE scores, and Cosine Similarity scores for each example.
- It will also print the average ROUGE and Cosine Similarity scores across all examples.
2. Generate Summary on Query (genrate_on_query)
This function allows the user to input a query (i.e., new content) and receive a generated summary of claims based on the trained model.
from your_package_name import genrate_on_query
# Generate summary for a user-provided query
genrate_on_query()
Dependencies
transformers: For utilizing pre-trained language models.sklearn: For cosine similarity and vectorization.rouge: For calculating ROUGE scores.
This package offers a lightweight and flexible way to generate summaries using zero-shot learning and can be integrated into any workflow requiring natural language summarization.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file claim_identification_prompt_tuning-0.1.1.tar.gz.
File metadata
- Download URL: claim_identification_prompt_tuning-0.1.1.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7131dd3ae76066ca3805bf6cd6f0ebb434eb1d80caf8114d4bd33a72b231f5e7
|
|
| MD5 |
e3cb6b6b5d779af32ca532569861de01
|
|
| BLAKE2b-256 |
9c78dee6de5f22757262939fc2220bb8bd006bebf7894592e1b7c02e6070b956
|
File details
Details for the file claim_identification_prompt_tuning-0.1.1-py3-none-any.whl.
File metadata
- Download URL: claim_identification_prompt_tuning-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
544ad8a82e1af373f3d2bd9d95caa819b2241c696f31bce12c19926634c3414f
|
|
| MD5 |
809848bbf38d3b26a09a340791f8671f
|
|
| BLAKE2b-256 |
db6a95feb74dd5bd93b54016c9b3c749cda62ae16fc520c0e3282e2cdc5f5535
|