Identifies Claims from Text

These details have not been verified by PyPI

Project description

Prompt Tuning for Claim Summarization

This Python package provides a tool to generate short summaries for content based on a set of zero-shot examples. The approach leverages prompt tuning to generate summaries of claims from input content using pre-existing models like Gemini, without needing a fine-tuning phase.

Approach

We utilize prompt tuning as the primary method for this project. Instead of fine-tuning the language model (which requires large datasets), we provide prompts by showing the model several examples of "Content" followed by the expected "Summary of Claims". This method is particularly useful when working with small datasets.

Data: The input data consists of "Content" (such as a conversation between a user and an agent) and "Reasons" (which are comma-separated summaries of the issues or claims).
Zero-shot learning: For each query, we randomly select a subset of examples (e.g., 7) from the dataset to use as reference examples. The prompt generator constructs a natural language prompt from these examples, asking the model to summarize the new content.
Evaluation: We evaluate the model's performance using:
- ROUGE Scores: Measures the overlap between the generated summary and the actual summary (Reason).
- Cosine Similarity: Measures the similarity between the TF-IDF vectors of the generated summary and the actual reason.

How to Use

Installation

Clone the repository or download the .zip file.
Ensure you have the required dependencies installed. You can install them using the following:
```
pip3 install setuptools 
```

Usage

This package exposes two main functions to the user:

1. Performance Evaluation (`perfomance_on_data`)

This function evaluates the model's performance across the entire dataset by generating summaries and calculating ROUGE and Cosine Similarity metrics.

from your_package_name import perfomance_on_data

# Evaluate the performance on the dataset
perfomance_on_data()

Output:

The function will print out the generated summaries, the actual summaries, ROUGE scores, and Cosine Similarity scores for each example.
It will also print the average ROUGE and Cosine Similarity scores across all examples.

2. Generate Summary on Query (`genrate_on_query`)

This function allows the user to input a query (i.e., new content) and receive a generated summary of claims based on the trained model.

from your_package_name import genrate_on_query

# Generate summary for a user-provided query
genrate_on_query()

Dependencies

transformers: For utilizing pre-trained language models.
sklearn: For cosine similarity and vectorization.
rouge: For calculating ROUGE scores.

This package offers a lightweight and flexible way to generate summaries using zero-shot learning and can be integrated into any workflow requiring natural language summarization.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.1

Sep 4, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

IdentificationOfClaims-0.1.1.tar.gz (4.5 kB view details)

Uploaded Sep 4, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

IdentificationOfClaims-0.1.1-py3-none-any.whl (5.3 kB view details)

Uploaded Sep 4, 2024 Python 3

File details

Details for the file IdentificationOfClaims-0.1.1.tar.gz.

File metadata

Download URL: IdentificationOfClaims-0.1.1.tar.gz
Upload date: Sep 4, 2024
Size: 4.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for IdentificationOfClaims-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`db7549dc4dd52cbc5bd573a5345a6ec756bc0995928b1fe435c4618af4a77948`
MD5	`48edcd9960c3dfe833f28c12df9c5028`
BLAKE2b-256	`88d0034d80de6865f32204baba88750ef56585bc3e28c0ad5868ce95636c993a`

See more details on using hashes here.

File details

Details for the file IdentificationOfClaims-0.1.1-py3-none-any.whl.

File metadata

Download URL: IdentificationOfClaims-0.1.1-py3-none-any.whl
Upload date: Sep 4, 2024
Size: 5.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for IdentificationOfClaims-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d40ac089f2da7bd47f84e04beb0937b9abb07d7fae5617c6f47b833fb5306329`
MD5	`cc29a6f24798dde741456e114e938c21`
BLAKE2b-256	`93251b461386196c2001714a0e61faab11808f0fc16d06cb9afae0de78b1b9f1`

See more details on using hashes here.

IdentificationOfClaims 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Prompt Tuning for Claim Summarization

Approach

How to Use

Installation

Usage

1. Performance Evaluation (`perfomance_on_data`)

2. Generate Summary on Query (`genrate_on_query`)

Dependencies

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

IdentificationOfClaims 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Prompt Tuning for Claim Summarization

Approach

How to Use

Installation

Usage

1. Performance Evaluation (perfomance_on_data)

2. Generate Summary on Query (genrate_on_query)

Dependencies

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. Performance Evaluation (`perfomance_on_data`)

2. Generate Summary on Query (`genrate_on_query`)