Skip to main content

an elegant bert4vector

Project description

bert4vector

向量计算、存储、检索、相似度计算

licence GitHub release PyPI PyPI - Downloads GitHub stars GitHub Issues contributions welcome

Documentation | Bert4torch | Examples | Source code

1. 下载安装

  • 安装稳定版
pip install bert4vector
  • 安装最新版
pip install git+https://github.com/Tongjilibo/bert4vector

2. 快速使用

  • 向量计算
from bert4vector.core import BertSimilarity
model = BertSimilarity('/data/pretrain_ckpt/simbert/sushen@simbert_chinese_tiny')
sentences = ['喜欢打篮球的男生喜欢什么样的女生', '西安下雪了?是不是很冷啊?', '第一次去见女朋友父母该如何表现?', '小蝌蚪找妈妈怎么样', '给我推荐一款红色的车', '我喜欢北京']
vecs = model.encode(sentences, convert_to_numpy=True, normalize_embeddings=False)
print(vecs.shape)
# (6, 312)
  • 相似度计算
from bert4vector.core import BertSimilarity
text2vec = BertSimilarity('/data/pretrain_ckpt/simbert/sushen@simbert_chinese_tiny')
sent1 = ['你好', '天气不错']
sent2 = ['你好啊', '天气很好']
similarity = text2vec.similarity(sent1, sent2)
print(similarity)
# [[0.9075422  0.42991278]
#  [0.19584633 0.72635853]]
  • 向量存储和检索
from bert4vector.core import BertSimilarity
model = BertSimilarity('/data/pretrain_ckpt/simbert/sushen@simbert_chinese_tiny')
model.add_corpus(['你好', '我选你', '天气不错', '人很好看'])
print(model.search('你好'))
# {'你好': [{'corpus_id': 0, 'score': 0.9999, 'text': '你好'},
#           {'corpus_id': 3, 'score': 0.5694, 'text': '人很好看'}]} 
  • api部署
from bert4vector.pipelines import SimilaritySever
server = SimilaritySever('/data/pretrain_ckpt/embedding/BAAI--bge-base-zh-v1.5')
server.run(port=port)
# 接口调用可以参考'./examples/api.py'

3. 支持的句向量权重

模型分类 模型名称 权重来源 权重链接 备注(若有)
simbert simbert 追一科技 Tongjilibo/simbert-chinese-base, Tongjilibo/simbert-chinese-small, Tongjilibo/simbert-chinese-tiny
simbert_v2/roformer-sim 追一科技 junnyu/roformer_chinese_sim_char_basejunnyu/roformer_chinese_sim_char_ft_basejunnyu/roformer_chinese_sim_char_smalljunnyu/roformer_chinese_sim_char_ft_small roformer_chinese_sim_char_base, roformer_chinese_sim_char_ft_base, roformer_chinese_sim_char_small, roformer_chinese_sim_char_ft_small
embedding text2vec-base-chinese shibing624 shibing624/text2vec-base-chinese text2vec-base-chinese
m3e moka-ai moka-ai/m3e-base m3e-base
bge BAAI BAAI/bge-large-en-v1.5, BAAI/bge-large-zh-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-base-zh-v1.5, BAAI/bge-small-en-v1.5, BAAI/bge-small-zh-v1.5 bge-large-en-v1.5, bge-large-zh-v1.5, bge-base-en-v1.5, bge-base-zh-v1.5, bge-small-en-v1.5, bge-small-zh-v1.5
gte thenlper thenlper/gte-large-zh, thenlper/gte-base-zh gte-base-zh, gte-large-zh

*注:

  1. 高亮格式(如Tongjilibo/simbert-chinese-small)的表示可直接联网下载
  2. 国内镜像网站加速下载
    • HF_ENDPOINT=https://hf-mirror.com python your_script.py
    • export HF_ENDPOINT=https://hf-mirror.com后再执行python代码
    • 在python代码开头如下设置
    import os
    os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"
    

4. 版本历史

更新日期 bert4vector 版本说明
20240928 0.0.5 小修改,api中可以reset
20240710 0.0.4 增加最长公共子序列字面召回,不安装torch也可以使用部分功能
20240628 0.0.3 增加多种字面召回,增加api接口部署

5. 更新历史:

  • 20240928:小修改,api中可以reset
  • 20240710:增加最长公共子序列字面召回,不安装torch也可以使用部分功能
  • 20240628:增加多种字面召回,增加api接口部署

6. Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bert4vector-0.0.5.tar.gz (30.9 kB view details)

Uploaded Source

File details

Details for the file bert4vector-0.0.5.tar.gz.

File metadata

  • Download URL: bert4vector-0.0.5.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.8

File hashes

Hashes for bert4vector-0.0.5.tar.gz
Algorithm Hash digest
SHA256 e87a731d1e25deaf17e62dd7a1b4a6c2d9c89d1896e46a81a625f8fa7f08f6e5
MD5 223d5b0a6845c3cc5abe89d7f56f0cda
BLAKE2b-256 3a6793d18a228c2e3a419c7cebb5e31942f3a14a0b512bea19dc204a1af54cc4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page