Skip to main content

MindSpore Text to vector Tool, encode text

Project description

ms2vec: MindSpore Text to Vector

移植自shibing624的text2vec库。

Text2vec: Text to Vector, Get Sentence Embeddings. 文本向量化,把文本(包括词、句子、段落)表征为向量矩阵。

text2vec实现了Word2Vec、RankBM25、BERT、Sentence-BERT、CoSENT等多种文本表征、文本相似度计算模型,并在文本语义匹配(相似度计算)任务上比较了各模型的效果。

Guide

Features

文本向量表示模型

  • Word2Vec:通过腾讯AI Lab开源的大规模高质量中文词向量数据(800万中文词轻量版) (文件名:light_Tencent_AILab_ChineseEmbedding.bin 密码: tawe)实现词向量检索,本项目实现了句子(词向量求平均)的word2vec向量表示
  • SBERT(Sentence-BERT):权衡性能和效率的句向量表示模型,训练时通过有监督训练BERT和softmax分类函数,文本匹配预测时直接取句子向量做余弦,句子表征方法,本项目基于MindSpore复现了Sentence-BERT模型的预测
  • CoSENT(Cosine Sentence):CoSENT模型提出了一种排序的损失函数,使训练过程更贴近预测,模型收敛速度和效果比Sentence-BERT更好,本项目基于MindSpore实现了CoSENT模型的预测
  • BGE(BAAI general embedding):BGE本项目基于MindSpore实现了BGE模型的预测

详细文本向量表示方法见wiki: 文本向量表示方法

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ms2vec-0.0.2-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file ms2vec-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ms2vec-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for ms2vec-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 82b3ef6e8f51d11b83a3eb308c93efac1e2cdd3434aa27b3c42cffb33f16f2ed
MD5 9c9f529c3412660a9554aca958c3e3d9
BLAKE2b-256 b574e5932c62673fc71b30a2dde0d3d1ab3094acb80b91b8e8a448df8c8c856e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page