Skip to main content

a library that get text embeddings

Project description

什么是 textembedding

GitHub 欢迎提 pr,如果有 bug 或新需求 请反馈 issue

textembeding 是我在计算文本相似度时经常需要用到的算法包,你可以使用它来加载预训练的Word2vec模型、得到词的向量和句子的向量,有助于您使用深度学习前的文本处理。

依赖与安装

jieba
numpy
gensim
pip3 install textembedding

使用 textembedding

加载word2vector模型

import textembedding as tb
model = tb.load_word2vect(modelpath)
vect_dim = model.vector_size

获得词向量

model:load_word2vect加载的词向量模型

word:传入参数为需要求向量的词

word_vect = tb.get_word_embedding(model,word='中国')

获得句子向量

model:load_word2vect加载的词向量模型

sentence:传入参数为需要求向量的句子

stop_words_path:用户自定义的stop words文件路径,文件和jieba的stop words格式一致

sent_vect = tb.get_sentence_embedding(model,sentence='我们缺少的不是机会,而是在机会面前将自己重新归零的勇气。',stop_words_path='')

获得向量相似度

query_vec:需要查询的向量

vec_list:向量库,在该库中查询向量

metirc_type:相似度的度量方式,目前只支持余弦相似度查询

返回的是从大到小排列的相似度,[(item01,item02),...,(itemn1,itemn2)],item1为相似度,item2为向量库下标

**支持两个字符串相似度的查询,利用tfidf算法。如果传入参数query_vec和vec_list均为字符串则触发tfidf相似度查询,速度快。

similarity = tb.get_vector_similarity(query_vect,vec_list=[vect1, vect2, vect3, vectn])
similarity = tb.get_vector_similarity("静态变量",vec_list="动态变量")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textembedding-1.0.4.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textembedding-1.0.4-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file textembedding-1.0.4.tar.gz.

File metadata

  • Download URL: textembedding-1.0.4.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for textembedding-1.0.4.tar.gz
Algorithm Hash digest
SHA256 19bc61ffc7251740104ca7c062c4c2dddadc9d6f99906b2c1e7ccf5c149ff9ef
MD5 38416d85351aea57f8138804edc4af55
BLAKE2b-256 52c7054e0ecb0e03f942b74e576214afb5cf70cf78071f0ff34a7d588b8e26b0

See more details on using hashes here.

File details

Details for the file textembedding-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: textembedding-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.22.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for textembedding-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8d8759b00542e3b94f984034ba587ec83916da0cbd07044e5acc67e82ffc738e
MD5 f0b1f03f7b0fc509b0cb215d033ffcf3
BLAKE2b-256 922157e3fe2e98cb87a20ab03f8b0dd1f439df36eb01e34b2c55747a8d080952

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page