Skip to main content

English word segmentation.

Project description

Features

  • Pure-Python

  • Segmentation Algorithm Using Divide and Conquer so that there is NO max length limit set to input text.

  • Segmentation Algotrithm used Dynamic Programming to achieve a polynomial time complexity.

  • Used Google Trillion Corpus to do scoring for the word segmentation.

  • Developed on Python 2.7

  • Tested on CPython 2.6, 2.7, 3.4.

Quickstart

Installing WordSegment is simple with pip:

$ pip install wordsegmentation

Tutorial

In your own Python programs, you’ll mostly want to use segment to divide a phrase into a list of its parts:

>>> from wordsegmentation import Wordsegment
>>> ws = WordSegment(use_google_corpus=True)

>>> ws.segment('universityofwashington')
['university', 'of', 'washington']
>>> ws.segment('thisisatest')
['this', 'is', 'a', 'test']
>>> segment('thisisatest')
['this', 'is', 'a', 'test']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordsegmentation-0.3.4.tar.gz (4.9 MB view hashes)

Uploaded Source

Built Distribution

wordsegmentation-0.3.4-py2.py3-none-any.whl (8.2 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page