Skip to main content

**DeepSE**: **Sentence Embeddings** based on Deep Nerual Networks, designed for **PRODUCTION** enviroment!

Project description

DeepSE

Python package PyPI version Python

DeepSE: 面向生产环境的Sentence Embedding

目录

  1. 安装
  2. 实现的模型

安装

克隆仓库:

git clone https://github.com/luozhouyang/deepse.git

或者从pypi安装:

pip install -U deepse

实现的模型

目前支持的模型如下:

  • 原始的BERT和RoBERTa
  • SimCSE
    • Unsupervised SimCSE
    • Supervised SimCSE
    • Supervised SimCSE (with hard negative)

BERT和RoBERTa

TODO: 补充文档

SimCSE

对于不同的版本,训练数据的格式稍有不同,但是都是普通文本文件,每一行都是一个JSON格式的训练样本。

对于Unsupervised SimCSE,每个样本都需要含有sequence字段。举例如下:

{"sequence": "我很讨厌自然语言处理"}
{"sequence": "我对自然语言处理很感兴趣"}

对于Supervised SimCSE,每个样本都需要包含sequencepositive_sequence字段。举例如下:

{"sequence": "我很讨厌自然语言处理", "positive_sequence": "我不喜欢自然语言处理"}
{"sequence": "我对自然语言处理很感兴趣", "positive_sequence": "我想了解自然语言处理"}

对于Supervised SimCSE with hard negative,每个样本都需要包含sequencepositive_sequencenegative_sequence字段。如果positive_sequence字段为空,则会自动使用sequence作为自己的positive_sequence。举例如下:

{"sequence": "我很讨厌自然语言处理", "positive_sequence": "我不喜欢自然语言处理", "negative_sequence": "我想了解自然语言处理"}
{"sequence": "我对自然语言处理很感兴趣", "positive_sequence": "我想了解自然语言处理", "negative_sequence": "我很讨厌自然语言处理"}

然后,使用以下命令即可训练:

export PRETRAINED_MODEL_PATH=/path/to/your/pretrained/bert/dir 
nohup python run_simcse.py >> log/run_simcse.log 2>&1 &
tail -f log/run_simcse.log

参数可以到run_simcse.py直接修改。

模型会同时保存成Checkpoint格式和SavedModel格式,后者你可以直接用tensorflow/serving部署在生产环境。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepse-0.0.4.tar.gz (14.4 kB view hashes)

Uploaded Source

Built Distribution

deepse-0.0.4-py3-none-any.whl (15.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page