an elegant bert4vector
Project description
bert4vector
向量计算、存储、检索、相似度计算(兼容sentence_transformers)
Documentation | Bert4torch | Examples | Source code
1. 下载安装
- 安装稳定版
pip install bert4vector
- 安装最新版
pip install git+https://github.com/Tongjilibo/bert4vector
2. 快速使用
- 向量计算
from bert4vector.core import BertSimilarity
model = BertSimilarity('/data/pretrain_ckpt/Tongjilibo/simbert-chinese-tiny')
sentences = ['喜欢打篮球的男生喜欢什么样的女生', '西安下雪了?是不是很冷啊?', '第一次去见女朋友父母该如何表现?', '小蝌蚪找妈妈怎么样', '给我推荐一款红色的车', '我喜欢北京']
vecs = model.encode(sentences, convert_to_numpy=True, normalize_embeddings=False)
print(vecs.shape)
# (6, 312)
- 相似度计算
from bert4vector.core import BertSimilarity
text2vec = BertSimilarity('/data/pretrain_ckpt/Tongjilibo/simbert-chinese-tiny')
sent1 = ['你好', '天气不错']
sent2 = ['你好啊', '天气很好']
similarity = text2vec.similarity(sent1, sent2)
print(similarity)
# [[0.9075422 0.42991278]
# [0.19584633 0.72635853]]
- 向量存储和检索
from bert4vector.core import BertSimilarity
model = BertSimilarity('/data/pretrain_ckpt/Tongjilibo/simbert-chinese-tiny')
model.add_corpus(['你好', '我选你', '天气不错', '人很好看'])
print(model.search('你好'))
# {'你好': [{'corpus_id': 0, 'score': 0.9999, 'text': '你好'},
# {'corpus_id': 3, 'score': 0.5694, 'text': '人很好看'}]}
- api部署
from bert4vector.pipelines import SimilaritySever
server = SimilaritySever('/data/pretrain_ckpt/embedding/BAAI--bge-base-zh-v1.5')
server.run(port=port)
# 接口调用可以参考'./examples/api.py'
3. 支持的句向量权重(除了以下权重,还支持 sentence_transformers支持的任意权重)
*注:
-
除了以上模型外,也支持
sentence_transformers支持的任意模型 -
高亮格式(如Tongjilibo/simbert-chinese-small)的表示可直接联网下载 -
国内镜像网站加速下载
HF_ENDPOINT=https://hf-mirror.com python your_script.pyexport HF_ENDPOINT=https://hf-mirror.com后再执行python代码- 在python代码开头如下设置
import os os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"
4. 版本历史
| 更新日期 | bert4vector | 版本说明 |
|---|---|---|
| 20251013 | 0.0.7.post2 | 去除对torch的完全依赖 |
| 20251009 | 0.0.7 | 增加 OpenaiSimilarityRequest和 OpenaiSimilarityAiohttp用于访问openai格式的远程模型 |
| 20250601 | 0.0.6 | add_corpus增加 corpus_property入参;增加 delete_corpus方法;支持任意 sentence_transformers模型 |
| 20240928 | 0.0.5 | 小修改,api中可以reset |
| 20240710 | 0.0.4 | 增加最长公共子序列字面召回,不安装torch也可以使用部分功能 |
| 20240628 | 0.0.3 | 增加多种字面召回,增加api接口部署 |
5. 更新历史:
- 20240928:小修改,api中可以reset
- 20240710:增加最长公共子序列字面召回,不安装torch也可以使用部分功能
- 20240628:增加多种字面召回,增加api接口部署
6. Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file bert4vector-0.0.7.post2.tar.gz.
File metadata
- Download URL: bert4vector-0.0.7.post2.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
704f29fe184fab455f3e2482b8841eafa08f68bcd13f4c62a7e08e116f8ca48d
|
|
| MD5 |
0749fac3aacd230365f8e22d2df7fbd1
|
|
| BLAKE2b-256 |
ea6e1b1aa087c0403a2741097eeeb1fd5f9d99f65181339198675ddd9275aec1
|