This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

A collection of various discourse segmenters (with pre-trained models for German texts).

Description

This python module currently comprises two discourse segmenters: edseg and bparseg.

edseg
is a rule-based system that uses shallow discourse-oriented parsing to determine boundaries of elementary discourse units in text. The rules are hard-coded in the submodule’s file and are only applicable to German input.
bparseg
is an ML-based segmentation module that operates on syntactic constituency trees (output from BitPar) and decides whether a syntactic constituent initiates a discourse segment or not using a pre-trained linear SVM model. This model was trained on the German PCC corpus, but you can also train your own classifer for any language using your own training data (cf. discourse_segmenter --help for further instructions on how to do that).

Since the current model is a serialized file and, therefore, likely to be incompatible with future releases of `numpy`, we will probably remove the model files from future versions of this package, including source data instead and performing training during the installation.

Installation

To install this package from the PyPi index, run

pip install dsegmenter

Alternatively, you can also install it directly from the source repository by executing:

git clone git@github.com:WladimirSidorenko/DiscourseSegmenter.git
pip install -r DiscourseSegmenter/requirements.txt DiscourseSegmenter/ --user

Usage

After installation, you can import the module in your python scripts (see an example here), e.g.:

from dsegmenter.bparseg import BparSegmenter

segmenter = BparSegmenter()

or, alternatively, also use the delivered front-end script discourse_segmenter to process your parsed input data, e.g.:

discourse_segmenter bparseg segment DiscourseSegmenter/examples/bpar/maz-8727.exb.bpar

Note that this script requires two mandatory arguments: the type of the segmenter to use (bparseg in the above case) and the operation to perform (which are specific to each segmenter).

Release History

Release History

0.0.1.dev1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
dsegmenter-0.0.1.dev1.linux-x86_64.tar.gz (2.2 MB) Copy SHA256 Checksum SHA256 any Dumb Binary Dec 28, 2015
dsegmenter-0.0.1.dev1.tar.gz (2.2 MB) Copy SHA256 Checksum SHA256 Source Dec 28, 2015

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting