SpaCy models for using Universal Sentence Encoder from TensorFlow Hub

Project description

Spacy - Universal Sentence Encoder

Motivation

Motivation to have different models: https://blog.floydhub.com/when-the-best-nlp-model-is-not-the-best-choice/ The USE is trained on different tasks which are more suited to identifying sentence similarity. Source Google AI blog https://ai.googleblog.com/2018/05/advances-in-semantic-textual-similarity.html

Install

You can install this repository:

pyPI: pip install spacy-universal-sentence-encoder
github: pip install git+https://https://github.com/MartinoMensio/spacy-universal-sentence-encoder-tfhub

Or you can install the following pre-packaged models with pip:

model name	source	pip package
en_use_md	https://tfhub.dev/google/universal-sentence-encoder	`pip install https://github.com/MartinoMensio/spacy-universal-sentence-encoder-tfhub/releases/download/en_use_md-0.2.1/en_use_md-1.tar.gz#en_use_md-0.2.1`
en_use_lg	https://tfhub.dev/google/universal-sentence-encoder-large	`pip install https://github.com/MartinoMensio/spacy-universal-sentence-encoder-tfhub/releases/download/en_use_lg-0.2.1/en_use_lg-0.2.1.tar.gz#en_use_lg-0.2.1`
xx_use_md	https://tfhub.dev/google/universal-sentence-encoder-multilingual	`pip install https://github.com/MartinoMensio/spacy-universal-sentence-encoder-tfhub/releases/download/xx_use_md-0.2.1/xx_use_md-0.2.1.tar.gz#xx_use_md-0.2.1`
xx_use_lg	https://tfhub.dev/google/universal-sentence-encoder-multilingual-large	`pip install https://github.com/MartinoMensio/spacy-universal-sentence-encoder-tfhub/releases/download/xx_use_lg-0.2.1/xx_use_lg-0.2.1.tar.gz#xx_use_lg-0.2.1`

Usage

First you have to import your model.

If you installed the model packages (see table above) you can use the usual spacy API to load this model:

import spacy
nlp = spacy.load('en_use_md')

Otherwise you need to load the model in the following way (the first time that it is run, it downloads the model)

import spacy_universal_sentence_encoder
nlp = spacy_universal_sentence_encoder.load_model('xx_use_lg')

Then you can use the models

# get two documents
doc_1 = nlp('Hi there, how are you?')
doc_2 = nlp('Hello there, how are you doing today?')
# get the vector of the Doc, Span or Token
print(doc_1.vector.shape)
print(doc_1[3].vector.shape)
print(doc_1[2:4].vector.shape)
# or use the similarity method that is based on the vectors, on Doc, Span or Token
print(doc_1.similarity(doc_2[0:7]))

You can use the model on a already available language pipeline (e.g. to keep your components or to have better parsing than the base spacy model used here):

import spacy
# this is your nlp object that can be anything
nlp = spacy.load('en_core_web_sm')
# import the specific

# get the pipe component
overwrite_vectors = nlp.create_pipe('overwrite_vectors')
# add to your nlp the pipeline stage
nlp.add_pipe(overwrite_vectors)
# use the vector with the default `en_use_md` model
doc = nlp('Hi')


# or use a different model
other_model_url = 'https://tfhub.dev/google/universal-sentence-encoder-multilingual/3'

# by setting the extension `tfhub_model_url` on the doc
doc._.tfhub_model_url = other_module_url

# or by adding a pipeline component that sets on every document
def set_tfhub_model_url(doc):
    doc._.tfhub_model_url = other_model_url
    return doc

# add this pipeline component before the `overwrite_vectors`, because it will look at that extension
nlp.add_pipe(set_tfhub_model_url, before='overwrite_vectors')

Project details

Release history Release notifications | RSS feed

0.4.6

Mar 23, 2023

0.4.5

Jul 11, 2022

0.4.4

Jul 11, 2022

0.4.3

Apr 26, 2021

0.4.2 yanked

Mar 30, 2021

Reason this release was yanked:

problem when using USE on top of a model with existing vocab

0.4.1

Mar 30, 2021

0.4.0

Mar 6, 2021

0.3.4

Nov 16, 2020

0.3.3

Nov 16, 2020

0.3.2

Sep 5, 2020

0.3.1

Aug 8, 2020

0.3.0

Jul 25, 2020

This version

0.2.1

May 29, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_universal_sentence_encoder-0.2.1.tar.gz (7.6 kB view details)

Uploaded May 29, 2020 Source

Built Distribution

spacy_universal_sentence_encoder-0.2.1-py3-none-any.whl (11.0 kB view details)

Uploaded May 29, 2020 Python 3

File details

Details for the file spacy_universal_sentence_encoder-0.2.1.tar.gz.

File metadata

Download URL: spacy_universal_sentence_encoder-0.2.1.tar.gz
Upload date: May 29, 2020
Size: 7.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for spacy_universal_sentence_encoder-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`7543f32525d09af475f0b94bef575b69bf25785af5c608361b82e51f68274677`
MD5	`9b8ae7494c0b6a362fdf07e1ed3b4568`
BLAKE2b-256	`2f4732cba196e13b06f4e613dbb318443113bc9b2cc5ad01c50539c4c5c9a9a9`

See more details on using hashes here.

File details

Details for the file spacy_universal_sentence_encoder-0.2.1-py3-none-any.whl.

File metadata

Download URL: spacy_universal_sentence_encoder-0.2.1-py3-none-any.whl
Upload date: May 29, 2020
Size: 11.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for spacy_universal_sentence_encoder-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c84a939b9bd4f5d2e339dc945cf2ae7cbc7cb6f027c7e5c93e245cf0c98e86de`
MD5	`5968b7e012c6233089024119eab280fd`
BLAKE2b-256	`4430d09ac9e264aa601720a493720ce391355716179556d4dee3cb99ab57a4d7`

See more details on using hashes here.

spacy-universal-sentence-encoder 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Spacy - Universal Sentence Encoder

Motivation

Install

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes