Skip to main content

A simple iterator for using a set of Chinese tokenizer

Project description

中文分词器集合

https://img.shields.io/pypi/v/chinese_tokenzier_iterator.svg https://img.shields.io/travis/howl-anderson/chinese_tokenzier_iterator.svg Documentation Status

一些中文分词器的简单封装和集合

Features

  • TODO

使用

from tokenizers_collection.config import tokenizer_registry
for name, tokenizer in tokenizer_registry:
    print("Tokenizer: {}".format(name))
    tokenizer('input_file.txt', 'output_file.txt')

安装

pip install tokenizers_collection

更新许可文件与下载模型

因为其中有些模型需要更新许可文件(比如:pynlpir)或者需要下载模型文件(比如:pyltp),因此安装后需要执行特定的命令完成操作,这里已经将所有的操作封装成了一个函数,只需要执行类似如下的指令即可

python -m tokenizers_collection.helper

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2018-08-28)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers_collection-0.1.2.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

tokenizers_collection-0.1.2-py2.py3-none-any.whl (7.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file tokenizers_collection-0.1.2.tar.gz.

File metadata

  • Download URL: tokenizers_collection-0.1.2.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.10.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.5

File hashes

Hashes for tokenizers_collection-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b43e162333bf43e2ae80d567bc9dff48ed9ecfdffbfbf2fafc62035c52b39f9e
MD5 3569cce32dca6ee44e544e6a31b5bc30
BLAKE2b-256 5f13524a0fae90c6254b9ccc62ec416ef5236e892b7e8455a8043c3cfc94e961

See more details on using hashes here.

File details

Details for the file tokenizers_collection-0.1.2-py2.py3-none-any.whl.

File metadata

  • Download URL: tokenizers_collection-0.1.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.10.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.5

File hashes

Hashes for tokenizers_collection-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 40bf32f5352f0542a8c92c7bab73451f21c873676568ed9709acdfb88601dc03
MD5 fab7fe2976415933da7132badf12a8a8
BLAKE2b-256 f20a2058b20bbaf939b8cfad4c7205af0853fbd6d75ff057c6e194f25f591dda

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page