rust-participle
Project description
相比纯 Python 实现的 jieba,速度更快,在分词过程中释放了 GIL,可适用于多线程处理
安装
pip install nazrin
用法
from nazrin import Nazrin
nazrin = Nazrin()
print(nazrin.cut('能找到想找的东西程度的能力'))
# ['能', '找到', '想', '找', '的', '东西', '程度', '的', '能力']
print(nazrin.tag('能找到想找的东西程度的能力'))
# [('能', 'v'), ('找到', 'v'), ('想', 'v'), ('找', 'v'), ('的', 'uj'), ('东西', 'ns'), ('程度', 'n'), ('的', 'uj'), ('能力', 'n')]
全部方法介绍
class Nazrin:
def __init__(self) -> None: ...
def add_word(
self, word: str, freq: int | None = None, tag: str | None = None
) -> int:
"""
说明:
把一个词加进字典。
参数:
* ``freq``: 词频,默认为计算值
* ``tag``: 词性,默认为 None
"""
...
def load_userdict(self, path: str) -> None:
"""
说明:
加载用户字典
参数:
* ``path``: 字典路径
"""
...
def suggest_freq(self, word: str) -> None:
"""
说明:
建议词频,以强制词语中的字符连接或分离。
参数:
* ``word``: 词语
"""
...
def cut(self, text: str, hmm: bool = True) -> list[str]:
"""
说明:
将包含汉字的整个句子分割成独立的单词,精确模式
参数:
* ``text``: 文本
* ``hmm``: 是否使用隐马尔可夫模型. 默认为 True.
"""
...
def cut_all(self, text: str) -> list[str]:
"""
说明:
将包含汉字的整个句子分割成独立的单词,完整模式
参数:
* ``text``: 文本
"""
...
def cut_for_search(self, text: str, hmm: bool = True) -> list[str]:
"""
说明:
将包含汉字的整个句子分割成独立的单词,搜索引擎模式
参数:
* ``text``: 文本
* ``hmm``: 是否使用隐马尔可夫模型. 默认为 True.
"""
...
def tag(self, text: str, hmm: bool = True) -> list[tuple[str, str]]:
"""
说明:
给文本打词性标签
参数:
* ``text``: 文本
* ``hmm``: 是否使用隐马尔可夫模型. 默认为 True.
"""
...
def tokenize(
self,
text: str,
mode: Literal["search", "default"] = "default",
hmm: bool = True,
) -> list[str]:
"""
说明:
Tokenize the text
参数:
* ``text``: 文本呢
* ``mode``: 模式. 默认为 "default".
* ``hmm``: 是否使用隐马尔可夫模型. 默认为 True.
"""
性能对比
In [1]: import jieba
In [2]: jieba.initialize()
Building prefix dict from the default dictionary ...
Loading model from cache jieba.cache
Loading model cost 0.647 seconds.
Prefix dict has been built successfully.
In [3]: from nazrin import Nazrin
In [4]: nazrin = Nazrin()
In [5]: with open("./docs/performance-test.txt", "r", encoding="utf-8") as f:
...: data = f.read()
...:
In [6]: %timeit list(jieba.cut(data))
3.77 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [7]: %timeit nazrin.cut(data)
283 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
鸣谢
naidesu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nazrin-0.3.0.tar.gz
(2.3 MB
view hashes)
Built Distributions
Close
Hashes for nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 153fdc08bc3a22cb56d2533c82d03bba05dc8d7c66242e3383d9cba27323a29a |
|
MD5 | 8dc446f0de69ace3a71eeb6403c2e6dd |
|
BLAKE2b-256 | 5624840bcd0f4a0a504d08a148e6259a780979d160dd2ead54ba87a6ce194214 |
Close
Hashes for nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73ce43ad822228b13bdb48f4695b19632356e1784491cc6def4a6b6d3a7415ed |
|
MD5 | 26a8dd70072e51552d21be04a90aadbc |
|
BLAKE2b-256 | e9e985def067cc0ad5bb582bbf6c01fbedc633db1e38ff3c57c4d36ddf52eec4 |
Close
Hashes for nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f1e1d9a23acead1e900d389c353cbd6055901fac9731e3eb9b97a150d92b32e |
|
MD5 | 3d9cfd4835941f93237a765d90e5cb2c |
|
BLAKE2b-256 | 7e7b4ba8de351315ef51c674d685dccfe4e033d759a441798d932f0b417328b9 |
Close
Hashes for nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ffed99074608d955077affbe80d2094ee541e4e8847f574e4f1ebc462b501ac |
|
MD5 | 0eaf5ff4979d5c6440879e4df9669bba |
|
BLAKE2b-256 | b47fadf26d00d5f23d83dc8bb3b14975863d965f94b23f44fedc484d96db449b |
Close
Hashes for nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a582c27293b76be9a8c2ada251687becbd1d7d40c94b5e6c5c1dfa65fa769513 |
|
MD5 | 37085e3793832784eaa86ce74b3a47b8 |
|
BLAKE2b-256 | cb439de6359d335bfb2ecc4abe2770c2538d79eb3a1431100e6da6687b269689 |
Close
Hashes for nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 159594acd4e97800a7ac7a7cf76ac64afcd65c3116f0f782556293ac3099d191 |
|
MD5 | fe564574e98055ac3fc3e96de03938eb |
|
BLAKE2b-256 | 1f78d6fcb871be9f2cb83f6e15cb55259742983f3d5c2c4b62b27703c88120a6 |
Close
Hashes for nazrin-0.3.0-cp312-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cca1ed864bcad0d40123459c3807d4986d34f8d9247f391c325366bf09f3cd78 |
|
MD5 | 990c2f2828e154c7162601a08d739ac0 |
|
BLAKE2b-256 | 9f32ae834d26660a6fab37b3cb80e7ffed4e8e49d02ecc7c9349ef7f4db8079c |
Close
Hashes for nazrin-0.3.0-cp312-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca7dc54f1f8a5a91fdc2ad6279846ef643229feca9ea9c865bb4afca70df02d9 |
|
MD5 | a01822f1a75e00b840da342d11507f15 |
|
BLAKE2b-256 | f29161caf363ab5010401e6818e266041db1d1292851b1d47b8c8d4b40d330b1 |
Close
Hashes for nazrin-0.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae5534f93544805e2031d94fa06ab72885d8c210fe3e4afea626c3d6cf39da57 |
|
MD5 | 0df33b7f83912bdd6ab00f70a8e1ca6e |
|
BLAKE2b-256 | df0a6606bacfaa6717355c1b62d1840926ae0a870e75145ba16c479aa2f6434f |
Close
Hashes for nazrin-0.3.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd3d72179431fc886192bac3439159c5aad2cce4fe7abdc86dcf90d4979f39c1 |
|
MD5 | 19884485a932c5c11f5b38e5b430204b |
|
BLAKE2b-256 | 656ec3243c60a734415581392a1d59b8d1ab1c4c13e5a8921b8feb20bd294458 |
Close
Hashes for nazrin-0.3.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38fa07f47244fa6ff73b1d31075515afd18cf21fb0375c8d00d6ea0c2bf36d7e |
|
MD5 | a459b11a0d3c006f8842ac8aabf4f65d |
|
BLAKE2b-256 | 979498f8a8dc4296b459b28c0662d589fdfe6d501b8fa1cbe4047e8d8b9f7974 |
Close
Hashes for nazrin-0.3.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62e6b0f49b4afb06bebbc3aa4f747716b21e34ea64f9b24d96c18a7ce0ca0296 |
|
MD5 | 442e018deffa39152eee04bf3bb7b722 |
|
BLAKE2b-256 | 681ec88baea0dfa53939aba7900d2e287a5ba39822599a87d353651c2b0cb8dd |
Close
Hashes for nazrin-0.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 865ec8770f5bcdc0b2a7da0146305957e50337ba91b4926fea5752569218160b |
|
MD5 | 7a82ad3ce4f0241d0f106cfcd5a85a60 |
|
BLAKE2b-256 | 3a1f70c86ed71c153d613302de316480dbf7b0d6e26076b41ef99afd0be46cb6 |
Close
Hashes for nazrin-0.3.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e29b3d2dc22d2d7a45b00b3120d0e8ae868ab727aff498debbeebf9161f5782e |
|
MD5 | 1ed9bf8c553b3830662616ad50083bd0 |
|
BLAKE2b-256 | 3ba642a71fd8d6f5b02b4e4a8d9d04618b47dc7e220f1b4827c357acc4ea4f1d |
Close
Hashes for nazrin-0.3.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63885b6116425edf5b1c8ca8ed12931fd420bc6701714acdc7133e6bb7408471 |
|
MD5 | 32f3d12543b70d38593f43afc76e4c39 |
|
BLAKE2b-256 | b26a97855ec70d5859988ed13bd47bf42545c83cbff7e877a77229d3081ef303 |
Close
Hashes for nazrin-0.3.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 136a3b6b2a454fefa7f38c2964bd22ff8cb07fc7671fcfce6afcd06e7c3a0a7c |
|
MD5 | 36f4443604272408de8450ebb6264f09 |
|
BLAKE2b-256 | b9c1b63216423302ce2f08755c9f8f8d2dc3a63d23f9628809ce68bef2200c25 |
Close
Hashes for nazrin-0.3.0-cp311-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a48f9d4ba510c58b15de8fa96156283c0ae09674eeb690e936f3774dbbbb746b |
|
MD5 | d65d2fa6de09c02c7960d86ae37c91e2 |
|
BLAKE2b-256 | 91f20de5a6068839226cac95c28e24555cc152482af14f203d7b82c218c6d0d0 |
Close
Hashes for nazrin-0.3.0-cp311-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07eca4d0e4a0a8e0d4dd30a64ff040aa40bf5febcaa52938e1f8e87ade7be8f8 |
|
MD5 | 13414aa6269003469cf5f0451f637e14 |
|
BLAKE2b-256 | f1eeff09327508765dab6124c4b99f2cd6ec5749c31a28091419f22e0687783a |
Close
Hashes for nazrin-0.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1246870b0bac552787f98d3d9712bc1b0098049517998acf98d87282f4f7b64 |
|
MD5 | 77794c4e3574cab44c258b1227d32a79 |
|
BLAKE2b-256 | 31cc60ff74c26fa32a69abf9109611b61e45c1aa8441edfc8c8e1207fae33d25 |
Close
Hashes for nazrin-0.3.0-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9da00b16490d3429527647501cbae2cb29f3ec6879e7593a9bb4d43ec5e8b9f |
|
MD5 | e29ed7eb87a4f23050cd0f217fd7e0dc |
|
BLAKE2b-256 | b3a0ae5e4b5c9b210b5c4125ba4d309f599a492796e93cfd217d80272958cc54 |
Close
Hashes for nazrin-0.3.0-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dde8e8eee05571e65f94c57755a7f9640e025728dffb5ef444548937b6d8b576 |
|
MD5 | 66c117fff637797ef1858f5faa03b4ba |
|
BLAKE2b-256 | 13f7920205c1b4b1fbafcff2ececcc3027bd80d02b2129bff3166fec4bdf6816 |
Close
Hashes for nazrin-0.3.0-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | abd91849ed287e3a9dc1a990483153f3e6e93b70e98a2ae63dedc74b356a5f35 |
|
MD5 | c19e938fa333059410b4acb06ade789e |
|
BLAKE2b-256 | 2c76525a3d406964cc4cbf43c6908bbe504a02c3abf55b1178fb0cb042bd46db |
Close
Hashes for nazrin-0.3.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e74a3b00b65d4a89bb98d99860ada3a67b60d897df4e570f1462c8c62dad89b |
|
MD5 | adb7745c25ec13820d3def4426846746 |
|
BLAKE2b-256 | da1194f0bb041984b24de5955f439d634a502a3f93270b4d6f39be577a68d1b8 |
Close
Hashes for nazrin-0.3.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 490998851e50f86841a627b7ae5ff30ffc4032c9fafb1c209c4513ba89ef2037 |
|
MD5 | 470415c2b95d367da0271fb52cb6d2be |
|
BLAKE2b-256 | 0cd52bcda0edaaa130a3751fbffd4b321a7d064a65958cd8a1bbdb83103895cc |
Close
Hashes for nazrin-0.3.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7486d9d86cb0dc248df2017d828eef7e40271be69aeb3ba51f18ad85cd54cb41 |
|
MD5 | 6fa578890e9057e3e8c8caaabb135640 |
|
BLAKE2b-256 | a4ffeb8adc812d6167eac357222b0038fc3694f8d4941c92bde6a896fa26e654 |
Close
Hashes for nazrin-0.3.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63d7e02333d2076c40017395ed8278cac8da775db62b3d58372db30d9e05800a |
|
MD5 | d81778d8889dba15d012080501e12fd4 |
|
BLAKE2b-256 | 7e1ed4259991b2ee63e5d4cae4742ad0216d1a33ddd6a97a13039ffcad2d265a |
Close
Hashes for nazrin-0.3.0-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21b93a21a521f40a0aff755c3e6fee9185f305704f4e5a25c7d72a480a1858e0 |
|
MD5 | e8f2d3b13c83812ef61658775834dae5 |
|
BLAKE2b-256 | d2c1a29092e29c25d905030c5c072c7d171c41c1016c5c2266733cd448c3e3db |
Close
Hashes for nazrin-0.3.0-cp310-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d19cfc6059a1bbd10ace99331f0769bef5906996edab802ea2342c54dc5a97a7 |
|
MD5 | 9c336cf23eb3fd820b48c014af68ac71 |
|
BLAKE2b-256 | 119abe74133ea4074ee00fafd94064ee4659f41bccca363aac10b183b5a83309 |
Close
Hashes for nazrin-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a88037698fee8cf3a6ee5f5bd81d4bb4d66e67e004bec45b2d90f5c73f6f709b |
|
MD5 | a0480b45cfb8a569ad7f26a09a1c86b8 |
|
BLAKE2b-256 | 02524effe0a7b4e4587fedcd0f0a9366de1528b4577b03a692f5b70608b13a67 |
Close
Hashes for nazrin-0.3.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 847475801c00f716fc3e1322bd9a8d51abe992a90f7408e0a10fe40d3c344604 |
|
MD5 | 905d88340760f6a4220d163e72771544 |
|
BLAKE2b-256 | 4fb2df305d0055bb468ff811bae35aaa1e1c7c3590306b92040d98821b3076b4 |
Close
Hashes for nazrin-0.3.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11ef8bc093c3bb17736980bc150b733c311293632d2954b21e1d36e5a41b74d8 |
|
MD5 | 6f1e5f162746867ee339390c3cea2b1b |
|
BLAKE2b-256 | 05cef1acd3f4f90ad410c25be6444d644f5029ac9c67e9d373c408de1aced0d3 |
Close
Hashes for nazrin-0.3.0-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38f841d17c4549fbd67c319e95759e8f7537476350d9072132bbd1756b624338 |
|
MD5 | a518770c5b881089003870debe55f613 |
|
BLAKE2b-256 | 3de0ee44319da20add2223cc6813d89057485f40053613d843e8a58db4d69113 |
Close
Hashes for nazrin-0.3.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0271267b64f50e1ad67c9d085d09029568bf674c80f9cac883b79b2fb9491ce8 |
|
MD5 | 988cd8e20efff3856a0f97a17ec63b76 |
|
BLAKE2b-256 | 031de3b1a6d72ea15d2ca35fc33151b04bedb22d153f22443eb8b973dbadcdef |
Close
Hashes for nazrin-0.3.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c70f451334ac9b040fd2ab84d2549e89b1f576647ee5a52f91a3c3bc4295d01 |
|
MD5 | bf0f1daa60455c269eff3df22896567e |
|
BLAKE2b-256 | d2bc2265c9999a67aa129774975ba7407fc4bcac2933ad6da820a68687362668 |
Close
Hashes for nazrin-0.3.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fcd2f127192f194c17b17d500346889bc8813c33799cfbc92c4677e256801e7 |
|
MD5 | 336ab61998338732e463b5ad8a714621 |
|
BLAKE2b-256 | 45dc84d7ab693a4c530fea840e91f181241db2d8628441dac6bde5af5ed9c701 |
Close
Hashes for nazrin-0.3.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0bfd041597fdd98a043acd70fdc7b270f7695417b3e180d2b6fe0913bea78d88 |
|
MD5 | 3726c06e0ed23d99e3d4983d00e21e9b |
|
BLAKE2b-256 | 7a1f9f9cd8743554e264bab4412dce5df3524035a973a22e47f352340b850c23 |