Concatenated-word segmentation Python library written in Rust
Project description
Concatenated-word segmentation Python library written in Rust
Table of Contents
About The Project
A fast concatenated-word segmentation library written in Rust, inspired by wordninja and wordsegment. The binding uses pyo3 to interact with the rust package.
Built With
Installation
pip3 install pywordsegment
Usage
import pywordsegment
# The internal UNIGRAMS & BIGRAMS corpuses are lazy initialized
# once per the whole module. Multiple WordSegmenter instances would
# not create new dictionaries.
# Segments a word to its parts
pywordsegment.WordSegmenter.segment(
text="theusashops",
)
# ["the", "usa", "shops"]
# This function checks whether the substring exists as a whole segment
# inside text.
pywordsegment.WordSegmenter.exist_as_segment(
substring="inter",
text="internationalairport",
)
# False
pywordsegment.WordSegmenter.exist_as_segment(
substring="inter",
text="intermilan",
)
# True
License
Distributed under the MIT License. See LICENSE
for more information.
Contact
Gal Ben David - gal@intsights.com
Project Link: https://github.com/intsights/pywordsegment
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pywordsegment-0.3.0.tar.gz
(18.9 MB
view hashes)
Built Distributions
Close
Hashes for pywordsegment-0.3.0-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e5127e7d8b2dda213370fc55d3152ea1b18436cb109c638e692361edfc4b867 |
|
MD5 | deadcef753074ea1d50d1e8433708935 |
|
BLAKE2b-256 | c3701622a86e17da59fcf4cb38d9ac72c9e76b7070b36e9566c903533494dffe |
Close
Hashes for pywordsegment-0.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7015163c5ec836ddc6f57a3faa92b9288c9e05422f3b4c6ad527159be52ff95 |
|
MD5 | 9a2f710a7ce2b0e1e2eea3cc1c7caa34 |
|
BLAKE2b-256 | 4305164dc8294e73cff1974d6c1a5484ad40f40b3f6aca3e610608a592de4ada |
Close
Hashes for pywordsegment-0.3.0-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5185bf51c8af19605a9d4d8d4a9160d715c43c016313d4a62c948bcfaa80f302 |
|
MD5 | e340a2b17bd69974be76d1d8a0220992 |
|
BLAKE2b-256 | af06d71e196a8f6aad68f96edb4af4529142f177426f56095e50f875f31645c5 |
Close
Hashes for pywordsegment-0.3.0-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f14340e8fab0245996b39791bbb2d6235fac457048b5726b68ccaf832874f52a |
|
MD5 | e6bc8ce84056562f3efaae467c628e89 |
|
BLAKE2b-256 | 38fb7aee5b3f3adc743768f230b4c330e8b4e8106c787ba7b0b07d2a754562a1 |
Close
Hashes for pywordsegment-0.3.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59efce1317715e483d71ab6cb5d3e99907f000766c9270037cc82d59b7c1d5f1 |
|
MD5 | bf5c2e49d72a27bef2ec67a6ebb80f77 |
|
BLAKE2b-256 | c38f6bc48ed8bd4405c9aa25b185511b8a8fdd6966ee41fd73eb7ac5e9cc4374 |
Close
Hashes for pywordsegment-0.3.0-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 656c88e4d2518037afe0d6a9d8f0a009bed84f241dc5ef18e114eb0720c55f76 |
|
MD5 | 53c96957f45c4f15822cf84e882e8704 |
|
BLAKE2b-256 | 057a0da84ac44a067395d38906c4629bd00badd6923db878f5418b02575f9f37 |
Close
Hashes for pywordsegment-0.3.0-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf5f725135414b70a9f961ac8c56089c32837619ff5163dec17afa0682a90cb3 |
|
MD5 | 3f67fbdf53dedaf9b93b01dd96a6f260 |
|
BLAKE2b-256 | 6fb57e6e647cb1f07e07cfeb5b067480ff2f20fa5a8545f1383c274514be1fa4 |
Close
Hashes for pywordsegment-0.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 325e9067c16781a25cfe6cc174aa23665dd9ff8681975903235cb7a275f21b59 |
|
MD5 | 2ad16606919a41356840817dff3c09ae |
|
BLAKE2b-256 | fdaeb6286f93800b848e4f66bdf0fb2b6c2070dde5d4b3703e6f34bba510e36c |
Close
Hashes for pywordsegment-0.3.0-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 253084956057f770bb1ec9eb5843c20085517ec001f2b54c3958ff1ba376ba6b |
|
MD5 | adcf8b6f8d45a14cc9f2b334b50c2955 |
|
BLAKE2b-256 | 46b2d7c8553d794e6d4f8c5dd955c2f76ebe207cfb3b0ae364e104076e75c8bb |
Close
Hashes for pywordsegment-0.3.0-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d53c276ecf279a0384319608a55d0c33b2b34652e72597e5457fbeb6b727f210 |
|
MD5 | 31276cbdf83f0b323533f155072b1b37 |
|
BLAKE2b-256 | 2391fe653e8d066c186dcd87cccc3d1e07eda996b9982f864f5841dbf3ffc6b2 |
Close
Hashes for pywordsegment-0.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00be7edc41f10605b49b5f864d63fbfaa21f67ce7b82242bca4340aab36bd549 |
|
MD5 | 8a406076d618781d237161999278b181 |
|
BLAKE2b-256 | 0b8dcaf0783e3774b7edaa0924da8bb22985f6084aacc1865573eb9c5a060e3a |
Close
Hashes for pywordsegment-0.3.0-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a949f82625991c4ed6440cbb8deb84a1c6e62526f377bb6cfaaeb179609a533 |
|
MD5 | ff13231a5f870a90d80596196796c6f4 |
|
BLAKE2b-256 | 535961c4c8ba8aa6614578230a7066bb8500bc55f7b22a1d3408141b46750846 |