an elegant bert4vector
Project description
bert4vector
向量计算、存储、检索、相似度计算(兼容sentence_transformers)
Documentation | Bert4torch | Examples | Source code
1. 下载安装
- 安装稳定版
pip install bert4vector
- 安装最新版
pip install git+https://github.com/Tongjilibo/bert4vector
2. 快速使用
- 向量计算
from bert4vector.core import BertSimilarity
model = BertSimilarity('/data/pretrain_ckpt/Tongjilibo/simbert-chinese-tiny')
sentences = ['喜欢打篮球的男生喜欢什么样的女生', '西安下雪了?是不是很冷啊?', '第一次去见女朋友父母该如何表现?', '小蝌蚪找妈妈怎么样', '给我推荐一款红色的车', '我喜欢北京']
vecs = model.encode(sentences, convert_to_numpy=True, normalize_embeddings=False)
print(vecs.shape)
# (6, 312)
- 相似度计算
from bert4vector.core import BertSimilarity
text2vec = BertSimilarity('/data/pretrain_ckpt/Tongjilibo/simbert-chinese-tiny')
sent1 = ['你好', '天气不错']
sent2 = ['你好啊', '天气很好']
similarity = text2vec.similarity(sent1, sent2)
print(similarity)
# [[0.9075422 0.42991278]
# [0.19584633 0.72635853]]
- 向量存储和检索
from bert4vector.core import BertSimilarity
model = BertSimilarity('/data/pretrain_ckpt/Tongjilibo/simbert-chinese-tiny')
model.add_corpus(['你好', '我选你', '天气不错', '人很好看'])
print(model.search('你好'))
# {'你好': [{'corpus_id': 0, 'score': 0.9999, 'text': '你好'},
# {'corpus_id': 3, 'score': 0.5694, 'text': '人很好看'}]}
- api部署
from bert4vector.pipelines import SimilaritySever
server = SimilaritySever('/data/pretrain_ckpt/embedding/BAAI--bge-base-zh-v1.5')
server.run(port=port)
# 接口调用可以参考'./examples/api.py'
3. 支持的句向量权重(除了以下权重,还支持 sentence_transformers支持的任意权重)
*注:
-
除了以上模型外,也支持
sentence_transformers支持的任意模型 -
高亮格式(如Tongjilibo/simbert-chinese-small)的表示可直接联网下载 -
国内镜像网站加速下载
HF_ENDPOINT=https://hf-mirror.com python your_script.pyexport HF_ENDPOINT=https://hf-mirror.com后再执行python代码- 在python代码开头如下设置
import os os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"
4. 版本历史
| 更新日期 | bert4vector | 版本说明 |
|---|---|---|
| 20251009 | 0.0.7 | 增加 OpenaiSimilarityRequest和 OpenaiSimilarityAiohttp用于访问openai格式的远程模型 |
| 20250601 | 0.0.6 | add_corpus增加 corpus_property入参;增加 delete_corpus方法;支持任意 sentence_transformers模型 |
| 20240928 | 0.0.5 | 小修改,api中可以reset |
| 20240710 | 0.0.4 | 增加最长公共子序列字面召回,不安装torch也可以使用部分功能 |
| 20240628 | 0.0.3 | 增加多种字面召回,增加api接口部署 |
5. 更新历史:
- 20240928:小修改,api中可以reset
- 20240710:增加最长公共子序列字面召回,不安装torch也可以使用部分功能
- 20240628:增加多种字面召回,增加api接口部署
6. Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bert4vector-0.0.7.tar.gz
(1.2 MB
view details)
File details
Details for the file bert4vector-0.0.7.tar.gz.
File metadata
- Download URL: bert4vector-0.0.7.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8eb4eda88e04a00bd06feea32e10f4fa0f917d9f20b453ed68a037c318461978
|
|
| MD5 |
d947522d510dd8d1bf65d4ab77204b5f
|
|
| BLAKE2b-256 |
eada8a1c9359bb41cb1d047dc3fb45a1ffbf3d3e6253064e5d11ff86c354627d
|