Skip to main content

pack_name descr

Project description

seg-text

pytestpythonCode style: blackLicense: MITPyPI version

Segment multilingual text to sentences

Pre-install PyICU/pycld2/polyglot

For Linux and friends

Install libicu, for example for Ubuntu:

apt install libicu-dev pkg-config
poetry add pyicu==2.8 pycld2 polyglot

For Windows

seg-text depends on polyglot which in turn depends on pyicu and pycld2. pyicu and pycld2 are difficult if not impossible to install in Windows using pip or poetry.

However, readily available whl packages can be downloaded from https://www.lfd.uci.edu/~gohlke/pythonlibs/ and installed (for example for python 3.8 amd64) as follows

pip install PyICU-2.8.1-cp38-cp38-win_amd64.whl pycld2-0.41-cp38-cp38-win_amd64.whl
pip install git+https://github.com/aboSamoor/polyglot@master

Refer to windows-pytest.yml and ubuntu-pytest.yml in .github/workflows for more details.

Install seg-text

pip install seg-text
# or pip install git+https://github.com/ffreemt/seg-text
# or poetry add git+https://github.com/ffreemt/seg-text

Use seg-text

from seg_text import seg_text

prin(seg_text(" text 1\n test 2. Test 3"))
# ["text 1", "test 2.", "Test 3"]

text = """ “元宇宙”,英文為“Metaverse”。該詞出自1992年;的科幻小說《雪崩》。 """
print(seg_text(text))
# ["“元宇宙”,英文為“Metaverse”。", "該詞出自1992年;的科幻小說《雪崩》。"]

# [;:] is a regex expression meaning either ; or :
# if you use ;: (without []), it would mean ;: together as a whole

print(seg_text(text, extra="[;:]"))
# ["“元宇宙”,英文為“Metaverse”。", "該詞出自1992年;", "的科幻小說《雪崩》。"]

Refer to seg_text.py for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seg_text-0.1.1.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distribution

seg_text-0.1.1-py3-none-any.whl (5.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page