To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.
Project description
Final Year Project on EDU Segmentation:
To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.
Segbot:
http://138.197.118.157:8000/segbot/
https://www.ijcai.org/proceedings/2018/0579.pdf
Installation
To use the EDUSegmentation module, follow these steps:
- Import the
download
module to download all models:
from edu_segmentation.download import download_models
download_models()
- Import the
edu_segmentation
module and its related classes
from edu_segmentation.main import EDUSegmentation, ModelFactory, BERTUncasedModel, BERTCasedModel, BARTModel
Usage
The edu_segmentation module provides an easy-to-use interface to perform EDU segmentation using different strategies and models. Follow these steps to use it:
- Create a segmentation strategy:
You can choose between the default segmentation strategy or a conjunction-based segmentation strategy.
Conjunction-based segmentation strategy: After the text has been EDU-segmented, if there are conjunctions at the start or end of each segment, the conjunctions will be isolated as its own segment.
Default segmentation strategy: No post-processing occurs after the text has been EDU-segmented
from edu_segmentation.main import DefaultSegmentation, ConjunctionSegmentation
- Create a model using the
ModelFactory
.
Choose from BERT Uncased, BERT Cased, or BART models.
model_type = "bert_uncased" # or "bert_cased", "bart"
model = ModelFactory.create_model(model_type)
- create an instance of
EDUSegmentation
using the chosen model:
edu_segmenter = EDUSegmentation(model)
- Segment the text using the chosen strategy:
text = "Your input text here."
granularity = "conjunction_words" # or "default"
conjunctions = ["and", "but", "however"] # Customize conjunctions if needed
device = 'cpu' # Choose your device, e.g., 'cuda:0'
segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
Example
Here's a simple example demonstrating how to use the edu_segmentation module:
from edu_segmentation.download import download_models
from edu_segmentation.main import ModelFactory, EDUSegmentation
download_models()
# Create a BERT Uncased model
model = ModelFactory.create_model("bart") # or bert_cased or bert_uncased
# Create an instance of EDUSegmentation using the model
edu_segmenter = EDUSegmentation(model)
# Segment the text using the conjunction-based segmentation strategy
text = "The food is good, but the service is bad."
granularity = "conjunction_words" # or default
conjunctions = ["and", "but", "however"] # customise as needed
device = 'cpu' # or cuda
segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
print(segmented_output)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file edu_segmentation-0.0.115.tar.gz
.
File metadata
- Download URL: edu_segmentation-0.0.115.tar.gz
- Upload date:
- Size: 317.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ed7151461a2ffb21f3dfafae0f2262b9d3e7fce13b93c75582e1bd8f81d827a |
|
MD5 | eea702e2157258dcae8a731d51ab2c4d |
|
BLAKE2b-256 | 15bd38d3ce7563f5282abbd66e9824975c0a84fa32f9b38bc57127b7e4cfe67c |
File details
Details for the file edu_segmentation-0.0.115-py3-none-any.whl
.
File metadata
- Download URL: edu_segmentation-0.0.115-py3-none-any.whl
- Upload date:
- Size: 327.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d36694d8f38b62cbd80ae97067d281bfe7d1897fb702cf1dbd639e9dc2fd3a7 |
|
MD5 | 5e33cf11400aa2388296611fd2cce805 |
|
BLAKE2b-256 | b3d2ec9a838336c10f40da19183e85dcf5c8a45df8a7218cb6765c6a422bbfd1 |