**DeepSE**: **Sentence Embeddings** based on Deep Nerual Networks, designed for **PRODUCTION** enviroment!
Project description
DeepSE
DeepSE: 面向生产环境的Sentence Embedding!
目录
- 安装
- 实现的模型
- 2.1 BERT和RoBERTa
- 2.2 SimCSE
安装
克隆仓库:
git clone https://github.com/luozhouyang/deepse.git
或者从pypi
安装:
pip install -U deepse
实现的模型
目前支持的模型如下:
- 原始的BERT和RoBERTa
- SimCSE
- Unsupervised SimCSE
- Supervised SimCSE
- Supervised SimCSE (with hard negative)
BERT和RoBERTa
TODO: 补充文档
SimCSE
对于不同的版本,训练数据的格式稍有不同,但是都是普通文本文件,每一行都是一个JSON格式的训练样本。
对于Unsupervised SimCSE
,每个样本都需要含有sequence
字段。举例如下:
{"sequence": "我很讨厌自然语言处理"}
{"sequence": "我对自然语言处理很感兴趣"}
对于Supervised SimCSE
,每个样本都需要包含sequence
和positive_sequence
字段。举例如下:
{"sequence": "我很讨厌自然语言处理", "positive_sequence": "我不喜欢自然语言处理"}
{"sequence": "我对自然语言处理很感兴趣", "positive_sequence": "我想了解自然语言处理"}
对于Supervised SimCSE with hard negative
,每个样本都需要包含sequence
、positive_sequence
和negative_sequence
字段。如果positive_sequence
字段为空,则会自动使用sequence
作为自己的positive_sequence
。举例如下:
{"sequence": "我很讨厌自然语言处理", "positive_sequence": "我不喜欢自然语言处理", "negative_sequence": "我想了解自然语言处理"}
{"sequence": "我对自然语言处理很感兴趣", "positive_sequence": "我想了解自然语言处理", "negative_sequence": "我很讨厌自然语言处理"}
然后,使用以下命令即可训练:
export PRETRAINED_MODEL_PATH=/path/to/your/pretrained/bert/dir
nohup python run_simcse.py >> log/run_simcse.log 2>&1 &
tail -f log/run_simcse.log
参数可以到
run_simcse.py
直接修改。模型会同时保存成Checkpoint格式和SavedModel格式,后者你可以直接用tensorflow/serving部署在生产环境。
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file deepse-0.0.4.tar.gz
.
File metadata
- Download URL: deepse-0.0.4.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 176a96b380c84d5c0e9610c92af3fce792ad75cfa99f18f986294edba23abc52 |
|
MD5 | 2604eee3274898bb05d5763dec765a8c |
|
BLAKE2b-256 | 1353e789949e1fce1ea97cca8d3aefe8cd9312bc466ac84ada0f176b1b46198f |
File details
Details for the file deepse-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: deepse-0.0.4-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1436b18bee5c45beb27834f3b76876a06386f83579579704dcb680f5393c9106 |
|
MD5 | 3aaa178ad017c61799b747e942d9d77d |
|
BLAKE2b-256 | 21081155ee9d117b67a1d61f5dba876d6f1c92568004aaf30d8720b8f2f7d9be |