pack_name descr
Project description
seg-text
Segment multilingual text to sentences
Pre-install PyICU/pycld2/polyglot
For Linux and friends
Install libicu
, for example for Ubuntu:
apt install libicu-dev pkg-config
poetry add pyicu==2.8 pycld2 polyglot
For Windows
seg-text
depends on polyglot
which in turn depends on pyicu
and pycld2
. pyicu
and pycld2
are difficult if not impossible to install in Windows using pip or poetry.
However, readily available whl
packages can be downloaded from https://www.lfd.uci.edu/~gohlke/pythonlibs/ and installed (for example for python 3.8 amd64) as follows
pip install PyICU-2.8.1-cp38-cp38-win_amd64.whl pycld2-0.41-cp38-cp38-win_amd64.whl
pip install git+https://github.com/aboSamoor/polyglot@master
Refer to windows-pytest.yml
and ubuntu-pytest.yml
in .github/workflows
for more details.
Install seg-text
pip install seg-text
# or pip install git+https://github.com/ffreemt/seg-text
# or poetry add git+https://github.com/ffreemt/seg-text
Use seg-text
from seg_text import seg_text
prin(seg_text(" text 1\n test 2. Test 3"))
# ["text 1", "test 2.", "Test 3"]
text = """ “元宇宙”,英文為“Metaverse”。該詞出自1992年;的科幻小說《雪崩》。 """
print(seg_text(text))
# ["“元宇宙”,英文為“Metaverse”。", "該詞出自1992年;的科幻小說《雪崩》。"]
# [;:] is a regex expression meaning either ; or :
# if you use ;: (without []), it would mean ;: together as a whole
print(seg_text(text, extra="[;:]"))
# ["“元宇宙”,英文為“Metaverse”。", "該詞出自1992年;", "的科幻小說《雪崩》。"]
Refer to seg_text.py
for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
seg_text-0.1.1.tar.gz
(4.8 kB
view hashes)