This project is a convenient part of the NLP project, including several already exposed projects such as summy and text processing.
Project description
Abstractive
This project is a convenient part of the NLP project, including several already exposed projects such as summy and text processing. One of the main functions is sentence token in japanese.
The open resources we use are:
- rouge [Apache-2.0 license]: https://github.com/pltrdy/rouge
- summy [Apache-2.0 license]: https://github.com/miso-belica/sumy
- nltk [Apache-2.0 license]: https://pypi.org/project/nltk/
- jieba [MIT License (MIT)]: https://pypi.org/project/jieba/
- MeCab [BSD License (BSD)]: https://pypi.org/project/mecab-python3/
- SudachiPy [Apache-2.0 license]: https://pypi.org/project/SudachiPy/
Examples
The sentence token example.
>>> from util_ds.nlp.sentence_token import sentenceToken
>>> sentence = "ドイツ連邦共和国(ドイツれんぽうきょうわこく、独: Bundesrepublik Deutschland)、通称ドイツ(独: Deutschland)は、中央ヨーロッパ西部に位置する連邦共和制国家。首都および最大の都市(英語版)はベルリン[1]。南がスイスとオーストリア、北にデンマーク、西をフランスとオランダとベルギーとルクセンブルク、東はポーランドとチェコとそれぞれ国境を接する。"
>>> sentences = sentenceToken("japanese", sentence)
>>> ["ドイツ連邦共和国(ドイツれんぽうきょうわこく、独: Bundesrepublik Deutschland)、通称ドイツ(独: Deutschland)は、中央ヨーロッパ西部に位置する連邦共和制国家。首都および最大の都市(英語版)はベルリン[1]。", "南がスイスとオーストリア、北にデンマーク、西をフランスとオランダとベルギーとルクセンブルク、東はポーランドとチェコとそれぞれ国境を接する。"]
Notes: The above functions can basically be replaced by the following functions.
>>> import re
>>> # Didn't consider the more complicated case here.
>>> def sentenceToken(language, text):
>>> pattern = '([。!?\?])([^」』)])'
>>> sentences = re.sub(pattern, r"\1\n\2", text).split("\n")
>>> sentences = list(map(lambda x: x.strip(), sentences))
>>> sentences = list(filter(lambda x: x!="", sentences))
>>> return sentences
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file util_ds-0.5.3.tar.gz.
File metadata
- Download URL: util_ds-0.5.3.tar.gz
- Upload date:
- Size: 38.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
994ea5b9b3b835e90a97ce94aa29eff7525a7956d35f4bfe09a13d2506adfe94
|
|
| MD5 |
6c3ef5b30e42bf5e5e9ea40341fb4a03
|
|
| BLAKE2b-256 |
c38ccc04f189fa51a1d6bc4d37ddeb01cc8a48788b34b458e2a79e2f7e276ee1
|
File details
Details for the file util_ds-0.5.3-py3-none-any.whl.
File metadata
- Download URL: util_ds-0.5.3-py3-none-any.whl
- Upload date:
- Size: 60.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef6f7034af5e0e387741817ab97af6fb22c885f6c9be85ed80ec28c4d213b117
|
|
| MD5 |
3a58ec039dde8652c7be150d52941651
|
|
| BLAKE2b-256 |
ac4cb28bb76d73de76dedae860b08c7a7c621fda6567234ba2648e216569fb81
|