SpaCy models for using sentence-BERT

Project description

Sentence-BERT for spaCy

This package wraps sentence-transformers (also known as sentence-BERT) directly in spaCy. You can substitute the vectors provided in any spaCy model with vectors that have been tuned specifically for semantic similarity.

The models below are suggested for analysing sentence similarity, as the STS benchmark indicates. Keep in mind that sentence-transformers are configured with a maximum sequence length of 128. Therefore for longer texts it may be more suitable to work with other models (e.g. Universal Sentence Encoder).

Install

To install this package, you can run one of the following:

pip install spacy_sentence_bert
pip install git+https://github.com/MartinoMensio/spacy-sentence-bert.git

Usage

With this package installed

import spacy_sentence_bert
nlp = spacy_sentence_bert.load_model('en_bert_base_nli_cls_token')

Or if a specific model is installed (e.g. pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/en_bert_base_nli_cls_token-0.1.0/en_bert_base_nli_cls_token-0.1.0.tar.gz)

import spacy
nlp = spacy.load('en_bert_base_nli_cls_token')

import spacy
import spacy_sentence_bert
nlp_base = spacy.load('en')
nlp = spacy_sentence_bert.create_from(nlp_base, 'en_bert_base_nli_cls_token')
nlp.pipe_names

Full list of models https://docs.google.com/spreadsheets/d/14QplCdTCDwEmTqrn1LH4yrbKvdogK4oQvYO1K1aPR5M/edit#gid=0

sentence-BERT name	spacy model name	dimensions	language	STS benchmark
`bert-base-nli-mean-tokens`	`en_bert_base_nli_mean_tokens`	768	en	77.12
`bert-base-nli-max-tokens`	`en_bert_base_nli_max_tokens`	768	en	77.21
`bert-base-nli-cls-token`	`en_bert_base_nli_cls_token`	768	en	76.30
`bert-large-nli-mean-tokens`	`en_bert_large_nli_mean_tokens`	1024	en	79.19
`bert-large-nli-max-tokens`	`en_bert_large_nli_max_tokens`	1024	en	78.41
`bert-large-nli-cls-token`	`en_bert_large_nli_max_tokens`	1024	en	78.29
`roberta-base-nli-mean-tokens`	`en_roberta_base_nli_mean_tokens`	768	en	77.49
`roberta-large-nli-mean-tokens`	`en_roberta_large_nli_mean_tokens`	1024	en	78.69
`distilbert-base-nli-mean-tokens`	`en_distilbert_base_nli_mean_tokens`	768	en	76.97
`bert-base-nli-stsb-mean-tokens`	`en_bert_base_nli_stsb_mean_tokens`	768	en	85.14
`bert-large-nli-stsb-mean-tokens`	`en_bert_large_nli_stsb_mean_tokens`	1024	en	85.29
`roberta-base-nli-stsb-mean-tokens`	`en_roberta_base_nli_stsb_mean_tokens`	768	en	85.40
`roberta-large-nli-stsb-mean-tokens`	`en_roberta_large_nli_stsb_mean_tokens`	1024	en	86.31
`distilbert-base-nli-stsb-mean-tokens`	`en_distilbert_base_nli_stsb_mean_tokens`	768	en	84.38
`distiluse-base-multilingual-cased`	`xx_distiluse_base_multilingual_cased`	512	Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, Turkish	80.10
`xlm-r-base-en-ko-nli-ststb`	`xx_xlm_r_base_en_ko_nli_ststb`	768	en,ko	81.47
`xlm-r-large-en-ko-nli-ststb`	`xx_xlm_r_base_en_ko_nli_ststb`	1024	en,ko	84.05

The models, when first used, download to the folder defined with TORCH_HOME in the environment variables (default ~/.cache/torch).

Project details

Release history Release notifications | RSS feed

0.1.2

Mar 16, 2021

0.1.1

Mar 7, 2021

0.1.0

Mar 7, 2021

0.0.4

Jul 24, 2020

0.0.3

Jul 24, 2020

This version

0.0.2

Jul 24, 2020

0.0.1

Jul 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_sentence_bert-0.0.2.tar.gz (4.5 kB view details)

Uploaded Jul 24, 2020 Source

File details

Details for the file spacy_sentence_bert-0.0.2.tar.gz.

File metadata

Download URL: spacy_sentence_bert-0.0.2.tar.gz
Upload date: Jul 24, 2020
Size: 4.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.3

File hashes

Hashes for spacy_sentence_bert-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`4af6fce10fe25fb6b59bf988d727290dc116b8a841a5a0ce93e492dbd53bf1f2`
MD5	`a10d92b15e6152fda0af69ef380e8323`
BLAKE2b-256	`68c43025f360cf55762a6d5bd59589c15287ed0e778757c30a4f146fc2b3a6a9`

See more details on using hashes here.

spacy-sentence-bert 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Sentence-BERT for spaCy

Install

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes