Skip to main content

pack_name descr

Project description

seg-text

pytestpythonCode style: blackLicense: MITPyPI version

Segment multilingual text to sentences

Currently for Python 3.8 only because of the package vtext used.

Pre-install fastetext whl for Windows

seg-text depends on fastlid which in turn depends on fasttext. Installing fasttext requires a C++ compiler.

For Windows without a C++ compiler, readily available whl packages can be downloaded from https://www.lfd.uci.edu/~gohlke/pythonlibs/ and installed (for example for python 3.8 amd64) as follows

pip install fasttext-0.9.2-cp38-cp38-win_amd64.whl

Install seg-text

pip install seg-text
# or pip install git+https://github.com/ffreemt/seg-text
# or poetry add git+https://github.com/ffreemt/seg-text

Use seg-text

from seg_text import seg_text

prin(seg_text(" text 1\n test 2. Test 3"))
# ["text 1", "test 2.", "Test 3"]

text = """ “元宇宙”,英文為“Metaverse”。該詞出自1992年;的科幻小說《雪崩》。 """
print(seg_text(text))
# ["“元宇宙”,英文為“Metaverse”。", "該詞出自1992年;的科幻小說《雪崩》。"]

# [;:] is a regex expression meaning either ; or :
# if you use ;: (without []), it would mean ;: together as a whole

print(seg_text(text, extra="[;:]"))
# ["“元宇宙”,英文為“Metaverse”。", "該詞出自1992年;", "的科幻小說《雪崩》。"]

Refer to seg_text.py for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seg_text-0.1.2.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

seg_text-0.1.2-py3-none-any.whl (5.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page