This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Collection of discourse segmenters (with pre-trained models for German)

Project Description

A collection of various discourse segmenters (with pre-trained models for German texts).

Description

This python module currently comprises three discourse segmenters: edseg, bparseg, and mateseg.

edseg
is a rule-based system that uses shallow discourse-oriented parsing to determine the boundaries of elementary discourse units. The rules are hard-coded in the submodule’s file and are only applicable to German input.
bparseg
is an ML-based segmentation module that operates on syntactic constituency trees (output from BitPar) and decides whether a syntactic constituent initiates a discourse segment or not using a pre-trained linear SVM model. This model was trained on the German PCC corpus, but you can also train your own classifer for any language using your own training data (cf. discourse_segmenter --help for further instructions on how to do that).
mateseg
is another ML-based segmentation module that operates on dependency trees (output from MateParser) and decides whether a sub-structure of the dependency graph initiates a discourse segment or not using a pre-trained linear SVM model. Again, this model was trained on the German PCC corpus.

Installation

To install this package from the PyPi index, run

pip install dsegmenter

Alternatively, you can also install it directly from the source repository by executing:

git clone git@github.com:discourse-lab/DiscourseSegmenter.git
pip install -r DiscourseSegmenter/requirements.txt DiscourseSegmenter/ --user

Usage

After installation, you can import the module in your python scripts (see an example here), e.g.:

from dsegmenter.bparseg import BparSegmenter

segmenter = BparSegmenter()

or, alternatively, also use the delivered front-end script discourse_segmenter to process your parsed input data, e.g.:

discourse_segmenter bparseg segment DiscourseSegmenter/examples/bpar/maz-8727.exb.bpar

or

discourse_segmenter mateseg segment DiscourseSegmenter/examples/conll/maz-8727.parsed.conll

Note that this script requires two mandatory arguments: the type of the segmenter to use (bparseg or mateseg in the above cases) and the operation to perform (which meight be specific to each segmenter).

Evaluation

Intrinsic evaluation scores of the machine learning models on the predicted vectors will be printed when training and evaluating a segmentation model.

Extrinsic evaluation scores on the predicted segmentation trees can be calculated with the evaluation script.

evaluation {FOLDER:TRUE} {FOLDER:PRED}

Note, that the script internally calls the DKpro agreement library, which requires Java 8.

Release History

Release History

This version
History Node

0.2.12

History Node

0.2.1

History Node

0.2.0

History Node

0.0.1.dev1

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
dsegmenter-0.2.12-py2.py3-none-any.whl (3.9 MB) Copy SHA256 Checksum SHA256 2.7 Wheel Jan 25, 2017
dsegmenter-0.2.12.tar.gz (3.8 MB) Copy SHA256 Checksum SHA256 Source Jan 25, 2017

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting