Transformer Embeddings
Project description
Transformer Embeddings
This library simplifies and streamlines the usage of encoder transformer models supported by HuggingFace's transformers
library (model hub or local) to generate embeddings for string inputs, similar to the way sentence-transformers
does.
Please note that starting with v4, we have dropped support for Python 3.7. If you need to use this library with Python 3.7, the latest compatible release is version 3.1.0
.
Why use this over HuggingFace's transformers
or sentence-transformers
?
Under the hood, we take care of:
- Can be used with any model on the HF model hub, with sensible defaults for inference.
- Setting the PyTorch model to
eval
mode. - Using
no_grad()
when doing the forward pass. - Batching, and returning back output in the format produced by HF transformers.
- Padding / truncating to model defaults.
- Moving to and from GPUs if available.
Installation
You can install Transformer Embeddings via pip from PyPI:
$ pip install transformer-embeddings
Usage
from transformer_embeddings import TransformerEmbeddings
transformer = TransformerEmbeddings("model_name")
If you have a previously instantiated model
and / or tokenizer
, you can pass that in.
transformer = TransformerEmbeddings(model=model, tokenizer=tokenizer)
transformer = TransformerEmbeddings(model_name="model_name", model=model)
or
transformer = TransformerEmbeddings(model_name="model_name", tokenizer=tokenizer)
Note: The model_name
should be included if only 1 of model or tokenizer are passed in.
Embeddings
To get output embeddings:
embeddings = transformer.encode(["Lorem ipsum dolor sit amet",
"consectetur adipiscing elit",
"sed do eiusmod tempor incididunt",
"ut labore et dolore magna aliqua."])
embeddings.output
Pooled Output
To get pooled outputs:
from transformer_embeddings import TransformerEmbeddings, mean_pooling
transformer = TransformerEmbeddings("model_name", return_output=False, pooling_fn=mean_pooling)
embeddings = transformer.encode(["Lorem ipsum dolor sit amet",
"consectetur adipiscing elit",
"sed do eiusmod tempor incididunt",
"ut labore et dolore magna aliqua."])
embeddings.pooled
Exporting the Model
Once you are done testing and training the model, it can be exported into a single tarball:
from transformer_embeddings import TransformerEmbeddings
transformer = TransformerEmbeddings("model_name")
transformer.export(additional_files=["/path/to/other/files/to/include/in/tarball.pickle"])
This tarball can also be uploaded to S3, but requires installing the S3 extras (pip install transformer-embeddings[s3]
). And then using:
from transformer_embeddings import TransformerEmbeddings
transformer = TransformerEmbeddings("model_name")
transformer.export(
additional_files=["/path/to/other/files/to/include/in/tarball.pickle"],
s3_path="s3://bucket/models/model-name/date-version/",
)
Contributing
Contributions are very welcome. To learn more, see the Contributor Guide.
License
Distributed under the terms of the Apache 2.0 license, Transformer Embeddings is free and open source software.
Issues
If you encounter any problems, please file an issue along with a detailed description.
Credits
This project was partly generated from @cjolowicz's Hypermodern Python Cookiecutter template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file transformer_embeddings-4.0.14.tar.gz
.
File metadata
- Download URL: transformer_embeddings-4.0.14.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.65.0 importlib-metadata/6.1.0 keyring/24.3.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f7e17cb7ed93e33bba0276c758447a437d8dbba07d3498115697b7f471e6fff |
|
MD5 | 1981963a6c7fbb0f5f1f9492e1ac9186 |
|
BLAKE2b-256 | 0f3349274d4886ae444d1f0748864f546915ae1520872738f4d2842460f5f430 |
File details
Details for the file transformer_embeddings-4.0.14-py3-none-any.whl
.
File metadata
- Download URL: transformer_embeddings-4.0.14-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.65.0 importlib-metadata/6.1.0 keyring/24.3.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72b7be9096ceb2f3411e8374951929a6356c164bca8e618b5f5e4a741bec4bcf |
|
MD5 | f6b47063410eb7325962696906f61b09 |
|
BLAKE2b-256 | 3a1d3556131f2f2888a4dcb09b089be4d6be4c81769edbf066c06d5bf6035439 |