Skip to main content

Concatenated-word segmentation Python library written in Rust

Project description

Logo

Concatenated-word segmentation Python library written in Rust

license Python OS Build PyPi

Table of Contents

About The Project

A fast concatenated-word segmentation library written in Rust, inspired by wordninja and wordsegment. The binding uses pyo3 to interact with the rust package.

Built With

Installation

pip3 install pywordsegment

Usage

import pywordsegment

# The internal UNIGRAMS & BIGRAMS corpuses are lazy initialized
# once per the whole module. Multiple WordSegmenter instances would
# not create new dictionaries.

# Segments a word to its parts
pywordsegment.WordSegmenter.segment(
    text="theusashops",
)
# ["the", "usa", "shops"]


# This function checks whether the substring exists as a whole segment
# inside text.
pywordsegment.WordSegmenter.exist_as_segment(
    substring="inter",
    text="internationalairport",
)
# False

pywordsegment.WordSegmenter.exist_as_segment(
    substring="inter",
    text="intermilan",
)
# True

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - gal@intsights.com

Project Link: https://github.com/intsights/pywordsegment

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pywordsegment-0.4.1-cp311-none-win_amd64.whl (9.6 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

pywordsegment-0.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pywordsegment-0.4.1-cp311-cp311-macosx_10_7_x86_64.whl (9.6 MB view hashes)

Uploaded CPython 3.11 macOS 10.7+ x86-64

pywordsegment-0.4.1-cp310-none-win_amd64.whl (9.6 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

pywordsegment-0.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pywordsegment-0.4.1-cp310-cp310-macosx_10_7_x86_64.whl (9.6 MB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

pywordsegment-0.4.1-cp39-none-win_amd64.whl (9.6 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

pywordsegment-0.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pywordsegment-0.4.1-cp39-cp39-macosx_10_7_x86_64.whl (9.6 MB view hashes)

Uploaded CPython 3.9 macOS 10.7+ x86-64

pywordsegment-0.4.1-cp38-none-win_amd64.whl (9.6 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

pywordsegment-0.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pywordsegment-0.4.1-cp38-cp38-macosx_10_7_x86_64.whl (9.6 MB view hashes)

Uploaded CPython 3.8 macOS 10.7+ x86-64

pywordsegment-0.4.1-cp37-none-win_amd64.whl (9.6 MB view hashes)

Uploaded CPython 3.7 Windows x86-64

pywordsegment-0.4.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

pywordsegment-0.4.1-cp37-cp37m-macosx_10_7_x86_64.whl (9.6 MB view hashes)

Uploaded CPython 3.7m macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page