edu-segmentation

To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.

These details have not been verified by PyPI

Project description

Final Year Project on EDU Segmentation:

Segbot:
http://138.197.118.157:8000/segbot/
https://www.ijcai.org/proceedings/2018/0579.pdf

Installation

To use the EDUSegmentation module, follow these steps:

Import the download module to download all models:

from edu_segmentation import download
download.download_models()

Import the edu_segmentation module and its related classes

from edu_segmentation.main import EDUSegmentation, ModelFactory, BERTUncasedModel, BERTCasedModel, BARTModel

Usage

The edu_segmentation module provides an easy-to-use interface to perform EDU segmentation using different strategies and models. Follow these steps to use it:

Create a segmentation strategy:

You can choose between the default segmentation strategy or a conjunction-based segmentation strategy.

Conjunction-based segmentation strategy: After the text has been EDU-segmented, if there are conjunctions at the start or end of each segment, the conjunctions will be isolated as its own segment.

Default segmentation strategy: No post-processing occurs after the text has been EDU-segmented

from edu_segmentation import DefaultSegmentation, ConjunctionSegmentation

Create a model using the ModelFactory.

Choose from BERT Uncased, BERT Cased, or BART models.

model_type = "bert_uncased"  # or "bert_cased", "bart"
model = ModelFactory.create_model(model_type)

create an instance of EDUSegmentation using the chosen model:

edu_segmenter = EDUSegmentation(model)

Segment the text using the chosen strategy:

text = "Your input text here."
granularity = "conjunction_words"  # or "default"
conjunctions = ["and", "but", "however"]  # Customize conjunctions if needed
device = 'cpu'  # Choose your device, e.g., 'cuda:0'

segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)

Example

Here's a simple example demonstrating how to use the edu_segmentation module:

from EDUSegmentation import EDUSegmentation, ModelFactory, BERTUncasedModel, ConjunctionSegmentation

# Create a BERT Uncased model
model = ModelFactory.create_model("bert_uncased")

# Create an instance of EDUSegmentation using the model
edu_segmenter = EDUSegmentation(model)

# Segment the text using the conjunction-based segmentation strategy
text = "The food is good, but the service is bad."
granularity = "conjunction_words"
conjunctions = ["and", "but", "however"]
device = 'cpu'

segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
print(segmented_output)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.115

Aug 13, 2023

0.0.114

Aug 13, 2023

0.0.113

Aug 12, 2023

0.0.112

Aug 12, 2023

0.0.111

Aug 12, 2023

This version

0.0.110

Aug 12, 2023

0.0.109

Aug 12, 2023

0.0.108

Aug 7, 2023

0.0.107

Aug 7, 2023

0.0.106

May 30, 2023

0.0.105

May 30, 2023

0.0.104

May 30, 2023

0.0.103

May 29, 2023

0.0.101

May 29, 2023

0.0.100

May 29, 2023

0.0.99

May 29, 2023

0.0.98

May 29, 2023

0.0.97

May 29, 2023

0.0.96

May 29, 2023

0.0.95

May 29, 2023

0.0.94

May 29, 2023

0.0.93

May 29, 2023

0.0.92

May 29, 2023

0.0.91

May 29, 2023

0.0.90

May 29, 2023

0.0.89

May 29, 2023

0.0.88

May 29, 2023

0.0.87

May 26, 2023

0.0.86

May 26, 2023

0.0.85

May 26, 2023

0.0.84

May 26, 2023

0.0.83

May 26, 2023

0.0.82

May 26, 2023

0.0.81

May 26, 2023

0.0.80

May 26, 2023

0.0.79

May 26, 2023

0.0.78

May 26, 2023

0.0.77

May 26, 2023

0.0.76

May 26, 2023

0.0.75

May 26, 2023

0.0.74

May 26, 2023

0.0.73

May 26, 2023

0.0.72

May 26, 2023

0.0.71

May 26, 2023

0.0.70

May 25, 2023

0.0.69

May 25, 2023

0.0.68

May 25, 2023

0.0.67

May 25, 2023

0.0.66

May 25, 2023

0.0.65

May 25, 2023

0.0.64

May 25, 2023

0.0.63

May 25, 2023

0.0.62

May 25, 2023

0.0.58

May 25, 2023

0.0.57

May 25, 2023

0.0.56

May 25, 2023

0.0.55

Apr 28, 2023

0.0.54

Apr 28, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edu_segmentation-0.0.110.tar.gz (316.9 kB view hashes)

Uploaded Aug 12, 2023 Source

Built Distribution

edu_segmentation-0.0.110-py3-none-any.whl (327.1 kB view hashes)

Uploaded Aug 12, 2023 Python 3

Hashes for edu_segmentation-0.0.110.tar.gz

Hashes for edu_segmentation-0.0.110.tar.gz
Algorithm	Hash digest
SHA256	`31676ff8aeeb189213948c0f2d8b973dd08135e3cb396a59205be2df79e4abce`
MD5	`a678d18807e4b8d2002d682bf55c19fd`
BLAKE2b-256	`4869637eb7badca0bd6ade877e64c7938a124601c35cac8e28d06b9b9f95b0ad`

Hashes for edu_segmentation-0.0.110-py3-none-any.whl

Hashes for edu_segmentation-0.0.110-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fecb28a7aeda994106fa5467c58bef25514388e03abe955c1d510f7f5b0598b9`
MD5	`0e66b9fa3ebb27d10ed7ccb4070d4527`
BLAKE2b-256	`24c54a645afe0260176a20466298813ec26f059919e93667abfdc820a93bbe01`