Skip to main content

**DeepSE**: **Sentence Embeddings** based on Deep Nerual Networks, designed for **PRODUCTION** enviroment!

Project description

DeepSE

Python package PyPI version Python

DeepSE: 面向生产环境的Sentence Embedding

目录

  1. 安装
  2. 实现的模型

安装

克隆仓库:

git clone https://github.com/luozhouyang/deepse.git

或者从pypi安装:

pip install -U deepse

实现的模型

目前支持的模型如下:

  • 原始的BERT和RoBERTa
  • SimCSE
    • Unsupervised SimCSE
    • Supervised SimCSE
    • Supervised SimCSE (with hard negative)

BERT和RoBERTa

TODO: 补充文档

SimCSE

对于不同的版本,训练数据的格式稍有不同,但是都是普通文本文件,每一行都是一个JSON格式的训练样本。

对于Unsupervised SimCSE,每个样本都需要含有sequence字段。举例如下:

{"sequence": "我很讨厌自然语言处理"}
{"sequence": "我对自然语言处理很感兴趣"}

对于Supervised SimCSE,每个样本都需要包含sequencepositive_sequence字段。举例如下:

{"sequence": "我很讨厌自然语言处理", "positive_sequence": "我不喜欢自然语言处理"}
{"sequence": "我对自然语言处理很感兴趣", "positive_sequence": "我想了解自然语言处理"}

对于Supervised SimCSE with hard negative,每个样本都需要包含sequencepositive_sequencenegative_sequence字段。如果positive_sequence字段为空,则会自动使用sequence作为自己的positive_sequence。举例如下:

{"sequence": "我很讨厌自然语言处理", "positive_sequence": "我不喜欢自然语言处理", "negative_sequence": "我想了解自然语言处理"}
{"sequence": "我对自然语言处理很感兴趣", "positive_sequence": "我想了解自然语言处理", "negative_sequence": "我很讨厌自然语言处理"}

然后,使用以下命令即可训练:

export PRETRAINED_MODEL_PATH=/path/to/your/pretrained/bert/dir 
nohup python run_simcse.py >> log/run_simcse.log 2>&1 &
tail -f log/run_simcse.log

参数可以到run_simcse.py直接修改。

模型会同时保存成Checkpoint格式和SavedModel格式,后者你可以直接用tensorflow/serving部署在生产环境。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepse-0.0.4.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

deepse-0.0.4-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file deepse-0.0.4.tar.gz.

File metadata

  • Download URL: deepse-0.0.4.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for deepse-0.0.4.tar.gz
Algorithm Hash digest
SHA256 176a96b380c84d5c0e9610c92af3fce792ad75cfa99f18f986294edba23abc52
MD5 2604eee3274898bb05d5763dec765a8c
BLAKE2b-256 1353e789949e1fce1ea97cca8d3aefe8cd9312bc466ac84ada0f176b1b46198f

See more details on using hashes here.

File details

Details for the file deepse-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: deepse-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for deepse-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 1436b18bee5c45beb27834f3b76876a06386f83579579704dcb680f5393c9106
MD5 3aaa178ad017c61799b747e942d9d77d
BLAKE2b-256 21081155ee9d117b67a1d61f5dba876d6f1c92568004aaf30d8720b8f2f7d9be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page