Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

A simple iterator for using a set of Chinese tokenizer

Project description

中文分词器集合

https://img.shields.io/pypi/v/chinese_tokenzier_iterator.svg https://img.shields.io/travis/howl-anderson/chinese_tokenzier_iterator.svg Documentation Status

一些中文分词器的简单封装和集合

Features

  • TODO

使用

from tokenizers_collection.config import tokenizer_registry
for name, tokenizer in tokenizer_registry:
    print("Tokenizer: {}".format(name))
    tokenizer('input_file.txt', 'output_file.txt')

安装

pip install tokenizers_collection

更新许可文件与下载模型

因为其中有些模型需要更新许可文件(比如:pynlpir)或者需要下载模型文件(比如:pyltp),因此安装后需要执行特定的命令完成操作,这里已经将所有的操作封装成了一个函数,只需要执行类似如下的指令即可

python -m tokenizers_collection.helper

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2018-08-28)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for tokenizers-collection, version 0.1.2
Filename, size File type Python version Upload date Hashes
Filename, size tokenizers_collection-0.1.2-py2.py3-none-any.whl (7.7 kB) File type Wheel Python version py2.py3 Upload date Hashes View hashes
Filename, size tokenizers_collection-0.1.2.tar.gz (11.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page