Skip to main content

SpaCy models for using sentence-BERT

Project description

Sentence-BERT for spaCy

This package wraps sentence-transformers (also known as sentence-BERT) directly in spaCy. You can substitute the vectors provided in any spaCy model with vectors that have been tuned specifically for semantic similarity.

The models below are suggested for analysing sentence similarity, as the STS benchmark indicates. Keep in mind that sentence-transformers are configured with a maximum sequence length of 128. Therefore for longer texts it may be more suitable to work with other models (e.g. Universal Sentence Encoder).

Install

To install this package, you can run one of the following:

  • pip install spacy_sentence_bert
  • pip install git+https://github.com/MartinoMensio/spacy-sentence-bert.git

Usage

With this package installed

import spacy_sentence_bert
nlp = spacy_sentence_bert.load_model('en_bert_base_nli_cls_token')

Or if a specific model is installed (e.g. pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/en_bert_base_nli_cls_token-0.1.0/en_bert_base_nli_cls_token-0.1.0.tar.gz)

import spacy
nlp = spacy.load('en_bert_base_nli_cls_token')
import spacy
import spacy_sentence_bert
nlp_base = spacy.load('en')
nlp = spacy_sentence_bert.create_from(nlp_base, 'en_bert_base_nli_cls_token')
nlp.pipe_names

Full list of models https://docs.google.com/spreadsheets/d/14QplCdTCDwEmTqrn1LH4yrbKvdogK4oQvYO1K1aPR5M/edit#gid=0

sentence-BERT name spacy model name dimensions language STS benchmark
bert-base-nli-mean-tokens en_bert_base_nli_mean_tokens 768 en 77.12
bert-base-nli-max-tokens en_bert_base_nli_max_tokens 768 en 77.21
bert-base-nli-cls-token en_bert_base_nli_cls_token 768 en 76.30
bert-large-nli-mean-tokens en_bert_large_nli_mean_tokens 1024 en 79.19
bert-large-nli-max-tokens en_bert_large_nli_max_tokens 1024 en 78.41
bert-large-nli-cls-token en_bert_large_nli_max_tokens 1024 en 78.29
roberta-base-nli-mean-tokens en_roberta_base_nli_mean_tokens 768 en 77.49
roberta-large-nli-mean-tokens en_roberta_large_nli_mean_tokens 1024 en 78.69
distilbert-base-nli-mean-tokens en_distilbert_base_nli_mean_tokens 768 en 76.97
bert-base-nli-stsb-mean-tokens en_bert_base_nli_stsb_mean_tokens 768 en 85.14
bert-large-nli-stsb-mean-tokens en_bert_large_nli_stsb_mean_tokens 1024 en 85.29
roberta-base-nli-stsb-mean-tokens en_roberta_base_nli_stsb_mean_tokens 768 en 85.40
roberta-large-nli-stsb-mean-tokens en_roberta_large_nli_stsb_mean_tokens 1024 en 86.31
distilbert-base-nli-stsb-mean-tokens en_distilbert_base_nli_stsb_mean_tokens 768 en 84.38
distiluse-base-multilingual-cased xx_distiluse_base_multilingual_cased 512 Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, Turkish 80.10
xlm-r-base-en-ko-nli-ststb xx_xlm_r_base_en_ko_nli_ststb 768 en,ko 81.47
xlm-r-large-en-ko-nli-ststb xx_xlm_r_base_en_ko_nli_ststb 1024 en,ko 84.05

The models, when first used, download to the folder defined with TORCH_HOME in the environment variables (default ~/.cache/torch).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_sentence_bert-0.0.2.tar.gz (4.5 kB view details)

Uploaded Source

File details

Details for the file spacy_sentence_bert-0.0.2.tar.gz.

File metadata

  • Download URL: spacy_sentence_bert-0.0.2.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.3

File hashes

Hashes for spacy_sentence_bert-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4af6fce10fe25fb6b59bf988d727290dc116b8a841a5a0ce93e492dbd53bf1f2
MD5 a10d92b15e6152fda0af69ef380e8323
BLAKE2b-256 68c43025f360cf55762a6d5bd59589c15287ed0e778757c30a4f146fc2b3a6a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page