Skip to main content

This project is a convenient part of the NLP project, including several already exposed projects such as summy and text processing.

Project description

Abstractive

This project is a convenient part of the NLP project, including several already exposed projects such as summy and text processing. One of the main functions is sentence token in japanese.

The open resources we use are:

Examples

The sentence token example.

>>> from util_ds.nlp.sentence_token import sentenceToken
>>> sentence = "ドイツ連邦共和国(ドイツれんぽうきょうわこく、独: Bundesrepublik Deutschland)、通称ドイツ(独: Deutschland)は、中央ヨーロッパ西部に位置する連邦共和制国家。首都および最大の都市(英語版)はベルリン[1]。南がスイスとオーストリア、北にデンマーク、西をフランスとオランダとベルギーとルクセンブルク、東はポーランドとチェコとそれぞれ国境を接する。"
>>> sentences = sentenceToken("japanese", sentence)
>>> ["ドイツ連邦共和国(ドイツれんぽうきょうわこく、独: Bundesrepublik Deutschland)、通称ドイツ(独: Deutschland)は、中央ヨーロッパ西部に位置する連邦共和制国家。首都および最大の都市(英語版)はベルリン[1]。", "南がスイスとオーストリア、北にデンマーク、西をフランスとオランダとベルギーとルクセンブルク、東はポーランドとチェコとそれぞれ国境を接する。"]

Notes: The above functions can basically be replaced by the following functions.

>>> import re
>>> # Didn't consider the more complicated case here.
>>> def sentenceToken(language, text):
>>>     pattern = '([。!?\?])([^」』)])'
>>>     sentences = re.sub(pattern, r"\1\n\2", text).split("\n")
>>>     sentences = list(map(lambda x: x.strip(), sentences))
>>>     sentences = list(filter(lambda x: x!="", sentences))
>>>     return sentences

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

util_ds-0.5.3.tar.gz (38.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

util_ds-0.5.3-py3-none-any.whl (60.8 kB view details)

Uploaded Python 3

File details

Details for the file util_ds-0.5.3.tar.gz.

File metadata

  • Download URL: util_ds-0.5.3.tar.gz
  • Upload date:
  • Size: 38.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for util_ds-0.5.3.tar.gz
Algorithm Hash digest
SHA256 994ea5b9b3b835e90a97ce94aa29eff7525a7956d35f4bfe09a13d2506adfe94
MD5 6c3ef5b30e42bf5e5e9ea40341fb4a03
BLAKE2b-256 c38ccc04f189fa51a1d6bc4d37ddeb01cc8a48788b34b458e2a79e2f7e276ee1

See more details on using hashes here.

File details

Details for the file util_ds-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: util_ds-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 60.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for util_ds-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ef6f7034af5e0e387741817ab97af6fb22c885f6c9be85ed80ec28c4d213b117
MD5 3a58ec039dde8652c7be150d52941651
BLAKE2b-256 ac4cb28bb76d73de76dedae860b08c7a7c621fda6567234ba2648e216569fb81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page