English word segmentation.
Project description
Features
Pure-Python
Segmentation Algorithm Using Divide and Conquer so that there is NO max length limit set to input text.
Segmentation Algotrithm used Dynamic Programming to achieve a polynomial time complexity.
Used Google Trillion Corpus to do scoring for the word segmentation.
Developed on Python 2.7
Tested on CPython 2.6, 2.7, 3.4.
Quickstart
Installing WordSegment is simple with pip:
$ pip install wordsegmentation
Tutorial
In your own Python programs, you’ll mostly want to use segment to divide a phrase into a list of its parts:
>>> from wordsegmentation import Wordsegment >>> ws = WordSegment(use_google_corpus=True) >>> ws.segment('universityofwashington') ['university', 'of', 'washington'] >>> ws.segment('thisisatest') ['this', 'is', 'a', 'test'] >>> segment('thisisatest') ['this', 'is', 'a', 'test']
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wordsegmentation-0.3.4.tar.gz
(4.9 MB
view hashes)
Built Distribution
Close
Hashes for wordsegmentation-0.3.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7c409c5528d5eaa6e0f296050e8cb7d43c5e460275496a41acd7d293cee0979 |
|
MD5 | bb63ee2d162087bf929e3c65dc2f370c |
|
BLAKE2b-256 | 8c4261db9985247c761a01441bab8208a575a2f30a03f0c5ad3bc77b4a769034 |