MicroTokenizer

A micro tokenizer for Chinese

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

微型中文分词器

一个微型的中文分词器，目前提供了七种分词算法:

按照词语的频率（概率）来利用构建 DAG（有向无环图）来分词，使用 Trie Tree 构建前缀字典树
使用隐马尔可夫模型（Hidden Markov Model，HMM）来分词
融合 DAG 和 HMM 两种分词模型的结果，按照分词粒度最大化的原则进行融合得到的模型
正向最大匹配法
反向最大匹配法
双向最大匹配法
基于 CRF (Conditional Random Field, 条件随机场) 的分词方法

特点 / 特色

面向教育：可以导出 graphml 格式的图结构文件，辅助学习者理解算法过程
良好的分词性能：由于使用类似结巴分词的算法，具有良好的分词性能
具有良好的扩展性：使用和结巴分词一样的字典文件，可以轻松添加自定义字典
自定义能力强
提供工具和脚本帮助用户训练自己的分词模型而不是使用内建的模型

更多内容见仓库 https://github.com/howl-anderson/MicroTokenizer

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.21.3

Oct 18, 2024

0.21.2

Sep 28, 2021

0.21.1

Sep 28, 2021

0.21.0

Sep 28, 2021

0.20.4

Aug 31, 2021

0.20.3

Aug 31, 2021

0.20.2

Jul 16, 2021

0.20.1

Jul 16, 2021

0.20.0

Jul 16, 2021

0.19.2

Jul 8, 2020

0.19.1

Jul 8, 2020

0.19.0

Dec 13, 2018

0.18.0

Oct 16, 2018

0.17.4

Sep 25, 2018

0.17.3

Sep 25, 2018

0.17.2

Sep 25, 2018

0.17.1

Sep 25, 2018

0.17.0

Sep 23, 2018

0.16.0

Sep 23, 2018

0.15.2

Sep 20, 2018

0.15.1

Sep 20, 2018

0.15.0

Sep 20, 2018

0.14.1

Sep 7, 2018

0.14.0

Sep 7, 2018

0.13.1

Sep 3, 2018

0.13.0

Sep 3, 2018

0.12.1

Sep 2, 2018

0.11.1

Sep 1, 2018

0.11.0

Sep 1, 2018

0.10.0

Sep 1, 2018

0.9.0

Sep 1, 2018

This version

0.8.0

Aug 28, 2018

0.7.11

Aug 19, 2018

0.7.10

Aug 19, 2018

0.7.9

Aug 19, 2018

0.7.8

Aug 19, 2018

0.7.6

Aug 19, 2018

0.1.7.6

Aug 19, 2018

0.1.7.5

Aug 19, 2018

0.1.7.4

Aug 18, 2018

0.1.7.3

Aug 18, 2018

0.1.7.2

Aug 18, 2018

0.1.7.1

Aug 18, 2018

0.1.7

Aug 16, 2018

0.1.6.3

Aug 14, 2018

0.1.6.2

Aug 14, 2018

0.1.6.1

Aug 14, 2018

0.1.6

Aug 13, 2018

0.1.5.1

Aug 12, 2018

0.1.5

Aug 12, 2018

0.1.4

Aug 12, 2018

0.1.2

Aug 6, 2018

0.1.1

Jul 18, 2018

0.1.0

Jul 13, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MicroTokenizer-0.8.0.tar.gz (12.1 MB view details)

Uploaded Aug 28, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

MicroTokenizer-0.8.0-py2.py3-none-any.whl (24.4 MB view details)

Uploaded Aug 28, 2018 Python 2Python 3

File details

Details for the file MicroTokenizer-0.8.0.tar.gz.

File metadata

Download URL: MicroTokenizer-0.8.0.tar.gz
Upload date: Aug 28, 2018
Size: 12.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.10.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for MicroTokenizer-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`2bf89601152749e0549a3dfde257c9d38a01426d5693e2931c782275dfa4c2dc`
MD5	`3f386e0cfb1e32ad6745b814db092797`
BLAKE2b-256	`47e18cb89d1c40fac72a077510e772390275b85ddf04b78f39daddd9fc4a53ef`

See more details on using hashes here.

File details

Details for the file MicroTokenizer-0.8.0-py2.py3-none-any.whl.

File metadata

Download URL: MicroTokenizer-0.8.0-py2.py3-none-any.whl
Upload date: Aug 28, 2018
Size: 24.4 MB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.10.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for MicroTokenizer-0.8.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`894873eb6b8b634f837e249396504edbe495d5aa1e1eacdb75710684bfd73f5e`
MD5	`52be8bf43ce1cf9c2ee629557f262ba1`
BLAKE2b-256	`c2cf4ae6b6e57927f56839f59c64db5c4d6fad0820e00c43e0bc794ba7c3adfd`

See more details on using hashes here.

MicroTokenizer 0.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

微型中文分词器

特点 / 特色

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes