Concatenated-word segmentation Python library written in Rust
Project description
Concatenated-word segmentation Python library written in Rust
Table of Contents
About The Project
A fast concatenated-word segmentation library written in Rust, inspired by wordninja and wordsegment. The binding uses pyo3 to interact with the rust package.
Built With
Installation
pip3 install pywordsegment
Usage
import pywordsegment
# The internal UNIGRAMS & BIGRAMS corpuses are lazy initialized
# once per the whole module. Multiple WordSegmenter instances would
# not create new dictionaries.
# Segments a word to its parts
pywordsegment.WordSegmenter.segment(
text="theusashops",
)
# ["the", "usa", "shops"]
# This function checks whether the substring exists as a whole segment
# inside text.
pywordsegment.WordSegmenter.exist_as_segment(
substring="inter",
text="internationalairport",
)
# False
pywordsegment.WordSegmenter.exist_as_segment(
substring="inter",
text="intermilan",
)
# True
License
Distributed under the MIT License. See LICENSE
for more information.
Contact
Gal Ben David - gal@intsights.com
Project Link: https://github.com/intsights/pywordsegment
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.
See tutorial on generating distribution archives.
Built Distributions
Close
Hashes for pywordsegment-0.4.1-cp311-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97bb519d4354cbf8c97d3454f66eb82f2f6fefde2e66ef01f5405975e7d8579b |
|
MD5 | f2f22fcc1edc68f7b1f3593cb6080487 |
|
BLAKE2b-256 | ddf80b344e4166cc07b86738010695cc75c0a5f01ceb2e256b2308c17a0d8228 |
Close
Hashes for pywordsegment-0.4.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fd0425e2565aee1c0cf4c02cd307955eead678abbb5a2ff6585638dad6019cb |
|
MD5 | ed0d3b99a122c463033760142731db45 |
|
BLAKE2b-256 | f61fc1174a47bc2aee695aee9f0548fa7b0a4df23724f07eb361db9f00896067 |
Close
Hashes for pywordsegment-0.4.1-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 189faef3a11253b723679c36aacafb23acd70ff725e2bd081eb18cee92bfe923 |
|
MD5 | 537d0e572c83be2ed822d3c920d722f8 |
|
BLAKE2b-256 | e5c21ba22431cf91eed63ad18b30f63161713619397cbd7d4db553dc837287a3 |
Close
Hashes for pywordsegment-0.4.1-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea21bc0e048138c8e74d7009b1870fe096f8365b86c1e76a5fd904892b48a236 |
|
MD5 | 72e59fe7b2275b577bc7943c204319df |
|
BLAKE2b-256 | a98984b8cd1a4f0c83949e0812b8d340eaa6991cfdf933089cb67167e7e013c5 |
Close
Hashes for pywordsegment-0.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4552e76c2d55c8525a24e71ee1789e7b6aafe1b84842d737b24ecfe3bcbfce10 |
|
MD5 | 477367f769ce551b2f13134a707d3b4f |
|
BLAKE2b-256 | 1f91c5f90deca8d1ae53cc6db18206993b5f6dbbe80732064503b60389ea0cff |
Close
Hashes for pywordsegment-0.4.1-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 191b8b141992aa7423afc728fc45cb12074e978cd9ecf68d490f94b758a604b7 |
|
MD5 | fd964129a0d3c9831d4313c3ae7c767c |
|
BLAKE2b-256 | ab9df1d885124dac7e71b8876b738d3ce5aa9ac1caa7803bc54ac0063088a376 |
Close
Hashes for pywordsegment-0.4.1-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7ee1c0dbaca3bdc4f5ff7acd1f3887f54cb29d2a9149346c509e374c27011f3 |
|
MD5 | 55d7deaab6a2c1740e895d704c31931e |
|
BLAKE2b-256 | 4c7281fb73d7420fc5ed9ae8f319ddc5a4d2fea0fa37e3037cea85485ccaf0d9 |
Close
Hashes for pywordsegment-0.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25e55ab58b3659a65e22cbc47faf61fb90570faa2a61dc09552cfbd86985a338 |
|
MD5 | 661691257ca1fc3b4735cfcbcb153952 |
|
BLAKE2b-256 | 4f568afd9afc7c95c9e11c9dfd2954cf0cb64020750b95245d4be313d5d8c41b |
Close
Hashes for pywordsegment-0.4.1-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | debe522161e3a2304e6ecfb99eba8c9fa70ad1de8f743715d26f3cb3b9ec7d4e |
|
MD5 | 059bc02c96656cb9e68f7050e54737f8 |
|
BLAKE2b-256 | 34a32c64fdeb4d21e14c5d23f796708b94f1e81ed6d9c321e0282713ebb498b9 |
Close
Hashes for pywordsegment-0.4.1-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2749331e0d4da7726317a7da223f13cbf8c5d8ddbc36b65fb6ea9b8d2e4adae4 |
|
MD5 | 03af75acc86e6c191cbcd1b6e51adfa0 |
|
BLAKE2b-256 | 168e018a4f08263dc5e5590f859c8cb76887dc9f3708d183d92722f68736580c |
Close
Hashes for pywordsegment-0.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98fc0bd3cb4fb4a515a2c79f33ab01b5f1d28eb9dc30e3faa1484a8a7f41a9d1 |
|
MD5 | 40b7963c5dcba8df62dd3daae6a2918e |
|
BLAKE2b-256 | 1cabb605c90b8911b9a5b340fd5da03650908644938d0a54d9f4a1de644ee14b |
Close
Hashes for pywordsegment-0.4.1-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 807e76b0b27336a2864d3ee7ea7ed99d9c9b0b4401dff80ba2d3d720f9682cd6 |
|
MD5 | 8318eb1650e75b46b2225c3061f11d8f |
|
BLAKE2b-256 | f7cccb097fcd9c0e41844d123bd82b643e63749352a947ab35812a17fae0c5a2 |
Close
Hashes for pywordsegment-0.4.1-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f425916b041d438e73db35a4cf15b7b1afb479e0f68985553e9583cc6614f38 |
|
MD5 | 4288bdf11db1c67b3ac33ca3f0d3003b |
|
BLAKE2b-256 | bb5617999ecca7b333df126e720ebda055f9c220d4c137233f5ab6e2598cfa76 |
Close
Hashes for pywordsegment-0.4.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a32561dc2bfb11dad17a760d05d7ffae63afb0615ad658bd0a4a34db27db730 |
|
MD5 | 7984a9893a65fb543beab02d3f1975f4 |
|
BLAKE2b-256 | 16b62c7e2e5be452a9d7c06449cc3e1074da7cc69a2cebd9c6808a54830f48ed |
Close
Hashes for pywordsegment-0.4.1-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8e95a3e1994371d7f079eca65ecdf822a111f3222cb51bfb0b6c3235ec04cc0 |
|
MD5 | 99c448e4087da6bb21f4523bbdd5a27f |
|
BLAKE2b-256 | b8a83ce29041779c649468b2bac1f898256c907c976c71150145996af9c216a6 |