Skip to main content

Concatenated-word segmentation Python library written in Rust

Project description

Logo

Concatenated-word segmentation Python library written in Rust

license Python OS Build PyPi

Table of Contents

About The Project

A fast concatenated-word segmentation library written in Rust, inspired by wordninja and wordsegment. The binding uses pyo3 to interact with the rust package.

Built With

Installation

pip3 install pywordsegment

Usage

import pywordsegment

# The internal UNIGRAMS & BIGRAMS corpuses are lazy initialized
# once per the whole module. Multiple WordSegmenter instances would
# not create new dictionaries.

# Segments a word to its parts
pywordsegment.WordSegmenter.segment(
    text="theusashops",
)
# ["the", "usa", "shops"]


# This function checks whether the substring exists as a whole segment
# inside text.
pywordsegment.WordSegmenter.exist_as_segment(
    substring="inter",
    text="internationalairport",
)
# False

pywordsegment.WordSegmenter.exist_as_segment(
    substring="inter",
    text="intermilan",
)
# True

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - gal@intsights.com

Project Link: https://github.com/intsights/pywordsegment

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywordsegment-0.3.0.tar.gz (18.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pywordsegment-0.3.0-cp310-none-win_amd64.whl (9.6 MB view details)

Uploaded CPython 3.10Windows x86-64

pywordsegment-0.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.5+ x86-64

pywordsegment-0.3.0-cp310-cp310-macosx_10_7_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.10macOS 10.7+ x86-64

pywordsegment-0.3.0-cp39-none-win_amd64.whl (9.6 MB view details)

Uploaded CPython 3.9Windows x86-64

pywordsegment-0.3.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.5+ x86-64

pywordsegment-0.3.0-cp39-cp39-macosx_10_7_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.9macOS 10.7+ x86-64

pywordsegment-0.3.0-cp38-none-win_amd64.whl (9.6 MB view details)

Uploaded CPython 3.8Windows x86-64

pywordsegment-0.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.5+ x86-64

pywordsegment-0.3.0-cp38-cp38-macosx_10_7_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.8macOS 10.7+ x86-64

pywordsegment-0.3.0-cp37-none-win_amd64.whl (9.6 MB view details)

Uploaded CPython 3.7Windows x86-64

pywordsegment-0.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.5+ x86-64

pywordsegment-0.3.0-cp37-cp37m-macosx_10_7_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.7mmacOS 10.7+ x86-64

File details

Details for the file pywordsegment-0.3.0.tar.gz.

File metadata

  • Download URL: pywordsegment-0.3.0.tar.gz
  • Upload date:
  • Size: 18.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/0.12.17

File hashes

Hashes for pywordsegment-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9fe7cfea58e278c7cb21f410499f3e5424660c2e4d5479b887c076a5120a2f89
MD5 b35696cf0826f37407753ec80776f28c
BLAKE2b-256 f894ce760e62b242b8fc544b8e2db582da09e8ab502759396e38c34096f7c0d0

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 3e5127e7d8b2dda213370fc55d3152ea1b18436cb109c638e692361edfc4b867
MD5 deadcef753074ea1d50d1e8433708935
BLAKE2b-256 c3701622a86e17da59fcf4cb38d9ac72c9e76b7070b36e9566c903533494dffe

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f7015163c5ec836ddc6f57a3faa92b9288c9e05422f3b4c6ad527159be52ff95
MD5 9a2f710a7ce2b0e1e2eea3cc1c7caa34
BLAKE2b-256 4305164dc8294e73cff1974d6c1a5484ad40f40b3f6aca3e610608a592de4ada

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp310-cp310-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 5185bf51c8af19605a9d4d8d4a9160d715c43c016313d4a62c948bcfaa80f302
MD5 e340a2b17bd69974be76d1d8a0220992
BLAKE2b-256 af06d71e196a8f6aad68f96edb4af4529142f177426f56095e50f875f31645c5

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 f14340e8fab0245996b39791bbb2d6235fac457048b5726b68ccaf832874f52a
MD5 e6bc8ce84056562f3efaae467c628e89
BLAKE2b-256 38fb7aee5b3f3adc743768f230b4c330e8b4e8106c787ba7b0b07d2a754562a1

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 59efce1317715e483d71ab6cb5d3e99907f000766c9270037cc82d59b7c1d5f1
MD5 bf5c2e49d72a27bef2ec67a6ebb80f77
BLAKE2b-256 c38f6bc48ed8bd4405c9aa25b185511b8a8fdd6966ee41fd73eb7ac5e9cc4374

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp39-cp39-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 656c88e4d2518037afe0d6a9d8f0a009bed84f241dc5ef18e114eb0720c55f76
MD5 53c96957f45c4f15822cf84e882e8704
BLAKE2b-256 057a0da84ac44a067395d38906c4629bd00badd6923db878f5418b02575f9f37

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp38-none-win_amd64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 cf5f725135414b70a9f961ac8c56089c32837619ff5163dec17afa0682a90cb3
MD5 3f67fbdf53dedaf9b93b01dd96a6f260
BLAKE2b-256 6fb57e6e647cb1f07e07cfeb5b067480ff2f20fa5a8545f1383c274514be1fa4

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 325e9067c16781a25cfe6cc174aa23665dd9ff8681975903235cb7a275f21b59
MD5 2ad16606919a41356840817dff3c09ae
BLAKE2b-256 fdaeb6286f93800b848e4f66bdf0fb2b6c2070dde5d4b3703e6f34bba510e36c

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp38-cp38-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 253084956057f770bb1ec9eb5843c20085517ec001f2b54c3958ff1ba376ba6b
MD5 adcf8b6f8d45a14cc9f2b334b50c2955
BLAKE2b-256 46b2d7c8553d794e6d4f8c5dd955c2f76ebe207cfb3b0ae364e104076e75c8bb

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp37-none-win_amd64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp37-none-win_amd64.whl
Algorithm Hash digest
SHA256 d53c276ecf279a0384319608a55d0c33b2b34652e72597e5457fbeb6b727f210
MD5 31276cbdf83f0b323533f155072b1b37
BLAKE2b-256 2391fe653e8d066c186dcd87cccc3d1e07eda996b9982f864f5841dbf3ffc6b2

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 00be7edc41f10605b49b5f864d63fbfaa21f67ce7b82242bca4340aab36bd549
MD5 8a406076d618781d237161999278b181
BLAKE2b-256 0b8dcaf0783e3774b7edaa0924da8bb22985f6084aacc1865573eb9c5a060e3a

See more details on using hashes here.

File details

Details for the file pywordsegment-0.3.0-cp37-cp37m-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pywordsegment-0.3.0-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 2a949f82625991c4ed6440cbb8deb84a1c6e62526f377bb6cfaaeb179609a533
MD5 ff13231a5f870a90d80596196796c6f4
BLAKE2b-256 535961c4c8ba8aa6614578230a7066bb8500bc55f7b22a1d3408141b46750846

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page