Skip to main content

To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.

Project description

Final Year Project on EDU Segmentation:

To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.

Segbot:
http://138.197.118.157:8000/segbot/
https://www.ijcai.org/proceedings/2018/0579.pdf


Installation

To use the EDUSegmentation module, follow these steps:

  1. Import the download module to download all models:
from edu_segmentation.download import download_models
download_models()
  1. Import the edu_segmentation module and its related classes
from edu_segmentation.main import EDUSegmentation, ModelFactory, BERTUncasedModel, BERTCasedModel, BARTModel

Usage

The edu_segmentation module provides an easy-to-use interface to perform EDU segmentation using different strategies and models. Follow these steps to use it:

  1. Create a segmentation strategy:

    You can choose between the default segmentation strategy or a conjunction-based segmentation strategy.

    Conjunction-based segmentation strategy: After the text has been EDU-segmented, if there are conjunctions at the start or end of each segment, the conjunctions will be isolated as its own segment.

    Default segmentation strategy: No post-processing occurs after the text has been EDU-segmented

from edu_segmentation.main import DefaultSegmentation, ConjunctionSegmentation
  1. Create a model using the ModelFactory.

    Choose from BERT Uncased, BERT Cased, or BART models.
model_type = "bert_uncased"  # or "bert_cased", "bart"
model = ModelFactory.create_model(model_type)
  1. create an instance of EDUSegmentation using the chosen model:
edu_segmenter = EDUSegmentation(model)
  1. Segment the text using the chosen strategy:
text = "Your input text here."
granularity = "conjunction_words"  # or "default"
conjunctions = ["and", "but", "however"]  # Customize conjunctions if needed
device = 'cpu'  # Choose your device, e.g., 'cuda:0'

segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)

Example

Here's a simple example demonstrating how to use the edu_segmentation module:

from edu_segmentation.download import download_models
from edu_segmentation.main import ModelFactory, EDUSegmentation

download_models()

# Create a BERT Uncased model
model = ModelFactory.create_model("bart") # or bert_cased or bert_uncased

# Create an instance of EDUSegmentation using the model
edu_segmenter = EDUSegmentation(model)

# Segment the text using the conjunction-based segmentation strategy
text = "The food is good, but the service is bad."
granularity = "conjunction_words" # or default
conjunctions = ["and", "but", "however"] # customise as needed
device = 'cpu' # or cuda

segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
print(segmented_output)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edu_segmentation-0.0.115.tar.gz (317.0 kB view details)

Uploaded Source

Built Distribution

edu_segmentation-0.0.115-py3-none-any.whl (327.2 kB view details)

Uploaded Python 3

File details

Details for the file edu_segmentation-0.0.115.tar.gz.

File metadata

  • Download URL: edu_segmentation-0.0.115.tar.gz
  • Upload date:
  • Size: 317.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for edu_segmentation-0.0.115.tar.gz
Algorithm Hash digest
SHA256 7ed7151461a2ffb21f3dfafae0f2262b9d3e7fce13b93c75582e1bd8f81d827a
MD5 eea702e2157258dcae8a731d51ab2c4d
BLAKE2b-256 15bd38d3ce7563f5282abbd66e9824975c0a84fa32f9b38bc57127b7e4cfe67c

See more details on using hashes here.

File details

Details for the file edu_segmentation-0.0.115-py3-none-any.whl.

File metadata

File hashes

Hashes for edu_segmentation-0.0.115-py3-none-any.whl
Algorithm Hash digest
SHA256 4d36694d8f38b62cbd80ae97067d281bfe7d1897fb702cf1dbd639e9dc2fd3a7
MD5 5e33cf11400aa2388296611fd2cce805
BLAKE2b-256 b3d2ec9a838336c10f40da19183e85dcf5c8a45df8a7218cb6765c6a422bbfd1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page