Skip to main content

Concatenated-word segmentation Python library written in Rust

Project description

Logo

Concatenated-word segmentation Python library written in Rust

license Python OS Build PyPi

Table of Contents

About The Project

A fast concatenated-word segmentation library written in Rust, inspired by wordninja and wordsegment. The binding uses pyo3 to interact with the rust package.

Built With

Installation

pip3 install pywordsegment

Usage

import pywordsegment

# The internal UNIGRAMS & BIGRAMS corpuses are lazy initialized
# once per the whole module. Multiple WordSegmenter instances would
# not create new dictionaries.

# Segments a word to its parts
pywordsegment.WordSegmenter.segment(
    text="theusashops",
)
# ["the", "usa", "shops"]


# This function checks whether the substring exists as a whole segment
# inside text.
pywordsegment.WordSegmenter.exist_as_segment(
    substring="inter",
    text="internationalairport",
)
# False

pywordsegment.WordSegmenter.exist_as_segment(
    substring="inter",
    text="intermilan",
)
# True

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - gal@intsights.com

Project Link: https://github.com/intsights/pywordsegment

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release. See tutorial on generating distribution archives.

Built Distributions

pywordsegment-0.4.1-cp311-none-win_amd64.whl (9.6 MB view hashes)

Uploaded cp311

pywordsegment-0.4.1-cp310-none-win_amd64.whl (9.6 MB view hashes)

Uploaded cp310

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page