a library that get text embeddings
Project description
什么是 textembedding
GitHub 欢迎提 pr,如果有 bug 或新需求 请反馈 issue
textembeding 是我在计算文本相似度时经常需要用到的算法包,你可以使用它来加载预训练的Word2vec模型、得到词的向量和句子的向量,有助于您使用深度学习前的文本处理。
依赖与安装
jieba
numpy
gensim
pip3 install textembedding
使用 textembedding
加载word2vector模型
import textembedding as tb
model = tb.load_word2vect(modelpath)
vect_dim = model.vector_size
获得词向量
model:load_word2vect加载的词向量模型
word:传入参数为需要求向量的词
word_vect = tb.get_word_embedding(model,word='中国')
获得句子向量
model:load_word2vect加载的词向量模型
sentence:传入参数为需要求向量的句子
stop_words_path:用户自定义的stop words文件路径,文件和jieba的stop words格式一致。
sent_vect = tb.get_sentence_embedding(model,sentence='我们缺少的不是机会,而是在机会面前将自己重新归零的勇气。',stop_words_path='')
获得向量相似度
query_vec:需要查询的向量
vec_list:向量库,在该库中查询向量
metirc_type:相似度的度量方式,目前只支持余弦相似度查询。
返回的是从大到小排列的相似度,[(item01,item02),...,(itemn1,itemn2)],item1为相似度,item2为向量库下标
**支持两个字符串相似度的查询,利用tfidf算法。如果传入参数query_vec和vec_list均为字符串则触发tfidf相似度查询,速度快。
similarity = tb.get_vector_similarity(query_vect,vec_list=[vect1, vect2, vect3, vectn])
similarity = tb.get_vector_similarity("静态变量",vec_list="动态变量")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file textembedding-1.0.4.tar.gz.
File metadata
- Download URL: textembedding-1.0.4.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19bc61ffc7251740104ca7c062c4c2dddadc9d6f99906b2c1e7ccf5c149ff9ef
|
|
| MD5 |
38416d85351aea57f8138804edc4af55
|
|
| BLAKE2b-256 |
52c7054e0ecb0e03f942b74e576214afb5cf70cf78071f0ff34a7d588b8e26b0
|
File details
Details for the file textembedding-1.0.4-py3-none-any.whl.
File metadata
- Download URL: textembedding-1.0.4-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d8759b00542e3b94f984034ba587ec83916da0cbd07044e5acc67e82ffc738e
|
|
| MD5 |
f0b1f03f7b0fc509b0cb215d033ffcf3
|
|
| BLAKE2b-256 |
922157e3fe2e98cb87a20ab03f8b0dd1f439df36eb01e34b2c55747a8d080952
|