pack_name descr
Project description
seg-text
Segment multilingual text to sentences
Pre-install PyICU/pycld2/polyglot
For Linux and friends
Install libicu
, for example for Ubuntu:
apt install libicu-dev pkg-config
poetry add pyicu==2.8 pycld2 polyglot
For Windows
seg-text
depends on polyglot
which in turn depends on pyicu
and pycld2
. pyicu
and pycld2
are difficult if not impossible to install in Windows using pip or poetry.
However, readily available whl
packages can be downloaded from https://www.lfd.uci.edu/~gohlke/pythonlibs/ and installed (for example for python 3.8 amd64) as follows
pip install PyICU-2.8.1-cp38-cp38-win_amd64.whl pycld2-0.41-cp38-cp38-win_amd64.whl
pip install git+https://github.com/aboSamoor/polyglot@master
Refer to windows-pytest.yml
and ubuntu-pytest.yml
in .github/workflows
for more details.
Install seg-text
pip install seg-text
# or pip install git+https://github.com/ffreemt/seg-text
# or poetry add git+https://github.com/ffreemt/seg-text
Use seg-text
from seg_text import seg_text
prin(seg_text(" text 1\n test 2. Test 3"))
# ["text 1", "test 2.", "Test 3"]
text = """ “元宇宙”,英文為“Metaverse”。該詞出自1992年;的科幻小說《雪崩》。 """
print(seg_text(text))
# ["“元宇宙”,英文為“Metaverse”。", "該詞出自1992年;的科幻小說《雪崩》。"]
# [;:] is a regex expression meaning either ; or :
# if you use ;: (without []), it would mean ;: together as a whole
print(seg_text(text, extra="[;:]"))
# ["“元宇宙”,英文為“Metaverse”。", "該詞出自1992年;", "的科幻小說《雪崩》。"]
Refer to seg_text.py
for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file seg_text-0.1.1.tar.gz
.
File metadata
- Download URL: seg_text-0.1.1.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 136215079bc9d3dde5ae2a48fa7ff49e7d3aca3ec0bf515a09b5b637b9ac2755 |
|
MD5 | 727dba32365d0d67e8351e9affd7117d |
|
BLAKE2b-256 | 876f5fcbd1806e6c7913004e31aeaf8b369fec91ae7e9520033e3217ce9e140a |
File details
Details for the file seg_text-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: seg_text-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd0789c9e5e5ac6a7fac386a406642e02db20f4167723d3d1b3c1fa1eb30228f |
|
MD5 | 6454178ab80ad0322efbac70391a2ade |
|
BLAKE2b-256 | fba26ec3c47e76fd2bb0de018ca81bc3ffbadb33bc862926b22e9f661de5e861 |