cppjieba Python bindings
Project description
Fork yanyiwu/cppjieba: "结巴" 中文分词的 C++ 版本 包装,性能接近。
Install
$ pip install cppjieba
# or
$ git clone https://github.com/hscspring/cppjieba.git
$ cd cppjieba
$ pip install .
$ # or :
$ python setup.py install
Usage
from cppjieba import Jieba
jb = Jieba()
# 可以指定自定义路径:
# dict_path: 分词词典路径
# hmm_path: hmm 模型路径
# user_dict_path: 用户自定义词典路径
# idf_path: idf 词典路径
# stop_words_path: 停用词路径
# Example: jb = Jieba(user_dict_path="path/to/your_dict")
s = "他来到了网易杭研大厦"
result = jb.cut(s)
print("/".join(result))
# 他/来到/了/网易/杭研/大厦
result = jb.cut(s, hmm=False)
print("/".join(result))
# 他/来到/了/网易/杭/研/大厦
s = "我来到北京清华大学"
result = jb.cut_all(s)
print(result)
# ['我', '来到', '北京', '清华', '清华大学', '华大', '大学']
s = "小明硕士毕业于中国科学院计算所,后在日本京都大学深造"
result = jb.cut_for_search(s)
print(result)
# ['小明', '硕士', '毕业', '于', '中国', '科学', '学院', '科学院', '中国科学院', '计算', '计算所', ',', '后', '在', '日本', '京都', '大学', '日本京都大学', '深造']
s = "我是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,当上CEO,走上人生巅峰。"
result = jb.pseg(s)
print(result)
# [('我', 'r'), ('是', 'v'), ('拖拉机', 'n'), ('学院', 'n'), ('手扶拖拉机', 'n'), ('专业', 'n'), ('的', 'uj'), ('。', 'x'), ('不用', 'v'), ('多久', 'm'), (',', 'x'), ('我', 'r'), ('就', 'd'), ('会', 'v'), ('升职', 'v'), ('加薪', 'nr'), (',', 'x'), ('当上', 't'), ('CEO', 'eng'), (',', 'x'), ('走上', 'v'), ('人生', 'n'), ('巅峰', 'n'), ('。', 'x')]
result = jb.extract(s)
print(result)
# [('CEO', 11.739204307083542), ('升职', 10.8561552143), ('加薪', 10.642581114), ('手扶拖拉机', 10.0088573539), ('巅峰', 9.49395840471)]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cppjieba-0.1.1.tar.gz
(5.0 MB
view details)
Built Distribution
File details
Details for the file cppjieba-0.1.1.tar.gz
.
File metadata
- Download URL: cppjieba-0.1.1.tar.gz
- Upload date:
- Size: 5.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9af08d587416abb3bbb78ba55c106ebb65cda39e6a2cbd3dca0531c71f6e7c71 |
|
MD5 | 2e74bb19de7866a4e396e0d24aa2664b |
|
BLAKE2b-256 | 808513675cc872b4e90d2839a92cde0d8748285a9284120473fdf0f9a06417cc |
Provenance
File details
Details for the file cppjieba-0.1.1-cp37-cp37m-macosx_10_14_x86_64.whl
.
File metadata
- Download URL: cppjieba-0.1.1-cp37-cp37m-macosx_10_14_x86_64.whl
- Upload date:
- Size: 5.2 MB
- Tags: CPython 3.7m, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8e3333b625e794c307e69e48ff5ce77326d6f31413106afb1255164b8aff92f |
|
MD5 | 1a4a08b910b4715d1489bde18f32a576 |
|
BLAKE2b-256 | 8dd80e57fd5e42a2d4e6dc6fc0bda5c08dea3a9455090f399ba5922d76b699c6 |