Skip to main content

Transformer Embeddings

Project description

Transformer Embeddings

PyPI Status Python Version License

Tests

pre-commit Black

This library simplifies and streamlines the usage of encoder transformer models supported by HuggingFace's transformers library (model hub or local) to generate embeddings for string inputs, similar to the way sentence-transformers does.

Please note that starting with v4, we have dropped support for Python 3.7. If you need to use this library with Python 3.7, the latest compatible release is version 3.1.0.

Why use this over HuggingFace's transformers or sentence-transformers?

Under the hood, we take care of:

  1. Can be used with any model on the HF model hub, with sensible defaults for inference.
  2. Setting the PyTorch model to eval mode.
  3. Using no_grad() when doing the forward pass.
  4. Batching, and returning back output in the format produced by HF transformers.
  5. Padding / truncating to model defaults.
  6. Moving to and from GPUs if available.

Installation

You can install Transformer Embeddings via pip from PyPI:

$ pip install transformer-embeddings

Usage

from transformer_embeddings import TransformerEmbeddings

transformer = TransformerEmbeddings("model_name")

If you have a previously instantiated model and / or tokenizer, you can pass that in.

transformer = TransformerEmbeddings(model=model, tokenizer=tokenizer)
transformer = TransformerEmbeddings(model_name="model_name", model=model)

or

transformer = TransformerEmbeddings(model_name="model_name", tokenizer=tokenizer)

Note: The model_name should be included if only 1 of model or tokenizer are passed in.

Embeddings

To get output embeddings:

embeddings = transformer.encode(["Lorem ipsum dolor sit amet",
                                 "consectetur adipiscing elit",
                                 "sed do eiusmod tempor incididunt",
                                 "ut labore et dolore magna aliqua."])
embeddings.output

Pooled Output

To get pooled outputs:

from transformer_embeddings import TransformerEmbeddings, mean_pooling

transformer = TransformerEmbeddings("model_name", return_output=False, pooling_fn=mean_pooling)

embeddings = transformer.encode(["Lorem ipsum dolor sit amet",
                                "consectetur adipiscing elit",
                                "sed do eiusmod tempor incididunt",
                                "ut labore et dolore magna aliqua."])

embeddings.pooled

Exporting the Model

Once you are done testing and training the model, it can be exported into a single tarball:

from transformer_embeddings import TransformerEmbeddings

transformer = TransformerEmbeddings("model_name")
transformer.export(additional_files=["/path/to/other/files/to/include/in/tarball.pickle"])

This tarball can also be uploaded to S3, but requires installing the S3 extras (pip install transformer-embeddings[s3]). And then using:

from transformer_embeddings import TransformerEmbeddings

transformer = TransformerEmbeddings("model_name")
transformer.export(
    additional_files=["/path/to/other/files/to/include/in/tarball.pickle"],
    s3_path="s3://bucket/models/model-name/date-version/",
)

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the Apache 2.0 license, Transformer Embeddings is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was partly generated from @cjolowicz's Hypermodern Python Cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformer_embeddings-4.0.14.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

transformer_embeddings-4.0.14-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file transformer_embeddings-4.0.14.tar.gz.

File metadata

  • Download URL: transformer_embeddings-4.0.14.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.65.0 importlib-metadata/6.1.0 keyring/24.3.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.18

File hashes

Hashes for transformer_embeddings-4.0.14.tar.gz
Algorithm Hash digest
SHA256 7f7e17cb7ed93e33bba0276c758447a437d8dbba07d3498115697b7f471e6fff
MD5 1981963a6c7fbb0f5f1f9492e1ac9186
BLAKE2b-256 0f3349274d4886ae444d1f0748864f546915ae1520872738f4d2842460f5f430

See more details on using hashes here.

File details

Details for the file transformer_embeddings-4.0.14-py3-none-any.whl.

File metadata

  • Download URL: transformer_embeddings-4.0.14-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.65.0 importlib-metadata/6.1.0 keyring/24.3.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.18

File hashes

Hashes for transformer_embeddings-4.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 72b7be9096ceb2f3411e8374951929a6356c164bca8e618b5f5e4a741bec4bcf
MD5 f6b47063410eb7325962696906f61b09
BLAKE2b-256 3a1d3556131f2f2888a4dcb09b089be4d6be4c81769edbf066c06d5bf6035439

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page