Skip to main content

pack_name descr

Project description

seg-text

pytestpythonCode style: blackLicense: MITPyPI version

Segment multilingual text to sentences

Pre-install PyICU/pycld2/polyglot

For Linux and friends

Install libicu, for example for Ubuntu:

apt install libicu-dev pkg-config
poetry add pyicu==2.8 pycld2 polyglot

For Windows

seg-text depends on polyglot which in turn depends on pyicu and pycld2. pyicu and pycld2 are difficult if not impossible to install in Windows using pip or poetry.

However, readily available whl packages can be downloaded from https://www.lfd.uci.edu/~gohlke/pythonlibs/ and installed (for example for python 3.8 amd64) as follows

pip install PyICU-2.8.1-cp38-cp38-win_amd64.whl pycld2-0.41-cp38-cp38-win_amd64.whl
pip install git+https://github.com/aboSamoor/polyglot@master

Refer to windows-pytest.yml and ubuntu-pytest.yml in .github/workflows for more details.

Install seg-text

pip install seg-text
# or pip install git+https://github.com/ffreemt/seg-text
# or poetry add git+https://github.com/ffreemt/seg-text

Use seg-text

from seg_text import seg_text

prin(seg_text(" text 1\n test 2. Test 3"))
# ["text 1", "test 2.", "Test 3"]

text = """ “元宇宙”,英文為“Metaverse”。該詞出自1992年;的科幻小說《雪崩》。 """
print(seg_text(text))
# ["“元宇宙”,英文為“Metaverse”。", "該詞出自1992年;的科幻小說《雪崩》。"]

# [;:] is a regex expression meaning either ; or :
# if you use ;: (without []), it would mean ;: together as a whole

print(seg_text(text, extra="[;:]"))
# ["“元宇宙”,英文為“Metaverse”。", "該詞出自1992年;", "的科幻小說《雪崩》。"]

Refer to seg_text.py for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seg_text-0.1.1.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

seg_text-0.1.1-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file seg_text-0.1.1.tar.gz.

File metadata

  • Download URL: seg_text-0.1.1.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.5 Windows/10

File hashes

Hashes for seg_text-0.1.1.tar.gz
Algorithm Hash digest
SHA256 136215079bc9d3dde5ae2a48fa7ff49e7d3aca3ec0bf515a09b5b637b9ac2755
MD5 727dba32365d0d67e8351e9affd7117d
BLAKE2b-256 876f5fcbd1806e6c7913004e31aeaf8b369fec91ae7e9520033e3217ce9e140a

See more details on using hashes here.

File details

Details for the file seg_text-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: seg_text-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.5 Windows/10

File hashes

Hashes for seg_text-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd0789c9e5e5ac6a7fac386a406642e02db20f4167723d3d1b3c1fa1eb30228f
MD5 6454178ab80ad0322efbac70391a2ade
BLAKE2b-256 fba26ec3c47e76fd2bb0de018ca81bc3ffbadb33bc862926b22e9f661de5e861

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page