MindSpore Text to vector Tool, encode text
Project description
ms2vec: MindSpore Text to Vector
移植自shibing624的text2vec库。
Text2vec: Text to Vector, Get Sentence Embeddings. 文本向量化,把文本(包括词、句子、段落)表征为向量矩阵。
text2vec实现了Word2Vec、RankBM25、BERT、Sentence-BERT、CoSENT等多种文本表征、文本相似度计算模型,并在文本语义匹配(相似度计算)任务上比较了各模型的效果。
Guide
Features
文本向量表示模型
- Word2Vec:通过腾讯AI Lab开源的大规模高质量中文词向量数据(800万中文词轻量版) (文件名:light_Tencent_AILab_ChineseEmbedding.bin 密码: tawe)实现词向量检索,本项目实现了句子(词向量求平均)的word2vec向量表示
- SBERT(Sentence-BERT):权衡性能和效率的句向量表示模型,训练时通过有监督训练BERT和softmax分类函数,文本匹配预测时直接取句子向量做余弦,句子表征方法,本项目基于MindSpore复现了Sentence-BERT模型的预测
- CoSENT(Cosine Sentence):CoSENT模型提出了一种排序的损失函数,使训练过程更贴近预测,模型收敛速度和效果比Sentence-BERT更好,本项目基于MindSpore实现了CoSENT模型的预测
- BGE(BAAI general embedding):BGE本项目基于MindSpore实现了BGE模型的预测
详细文本向量表示方法见wiki: 文本向量表示方法
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
ms2vec-0.0.2-py3-none-any.whl
(25.3 kB
view details)
File details
Details for the file ms2vec-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: ms2vec-0.0.2-py3-none-any.whl
- Upload date:
- Size: 25.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82b3ef6e8f51d11b83a3eb308c93efac1e2cdd3434aa27b3c42cffb33f16f2ed |
|
MD5 | 9c9f529c3412660a9554aca958c3e3d9 |
|
BLAKE2b-256 | b574e5932c62673fc71b30a2dde0d3d1ab3094acb80b91b8e8a448df8c8c856e |