Microservice for NLP tasks using gRPC

These details have not been verified by PyPI

Project links

Project description

NLP Microservice

The goal of this project is to provide a gRPC server for resource-heavy NLP tasks—for instance, computing vectors/embeddings for words or sentences. By using protobuf internally, our NLP server provides native and strongly typed interfaces for many programming languages. There are multiple advantages that arise from outsourcing such computations to such a server:

If multiple apps rely on NLP, the underlying models (which are usually quite large) only need to be loaded once into the main memory.
All programming languages supported by gRPC get easy access to state-of-the-art NLP architectures (e.g., transformers).
The logic is consolidated at a central place, drastically decreasing the maintenance effort required.

In addition to the server, we also provide a client containing convenience functions. This makes it easier for python applications to interact with the gRPC server. We will discuss the client at the end of this README.

Installation and Setup

We are using poetry to manage the dependencies. For easier setup, we also provide a Dockerfile and a docker-compose specification.

Poetry

# The server dependencies are optional, thus they have to be installed explicitly.
poetry install --extras server
# To get startet, we recommend to use the default spacy model.
# In case you are dealing with English texts, you can run.
poetry run python -m spacy download core_en_web_lg
# To run the server, you need to specify the address it should listen on.
# In this example, it should liston on port 5678 on localhost.
poetry run python -m nlp_service "127.0.0.1:5678"

Docker

# You have to specify the host and port that the docker service should use
# We are creating a file called .env for this.
# Please note that for localhost inside Docker, you need to use 0.0.0.0, otherwise you will not be able to connect.
echo "PORT=5678" >> .env
# Now we can start the service.
docker-compose up

General Usage

Once the server is running, you are free to call any of the functions defined in the underlying protobuf file. The corresponding documentation is located in the same GitHub project. Please note: The examples here use the Python programming language, but are also directly applicable to any other language supported by gRPC.

import grpc
from arg_services.nlp.v1 import nlp_pb2, nlp_pb2_grpc

# First of all, we are creating a channel (i.e., establish a connection to our server)
channel = grpc.insecure_channel("127.0.0.1:5678")

# The channel can now be used to create the actual client (allowing us to call all available functions)
client = nlp_pb2_grpc.NlpServiceStub(channel)

# Now the time has come to prepare our actual function call.
# We will start by creating a very simple NlpConfig with the default spacy model.
# FOr details about the parameters, please have a look at the next section.
config = nlp_pb2.NlpConfig(
  language="en",
  spacy_model="en_core_web_lg",
)

# Next, we will build a request to query vectors from our server.
request = nlp_pb2.VectorsRequest(
  # The first parameter is a list of strings that shall be embedded by our server.
  texts=["What a great tutorial!", "I will definitely recommend this to my friends."],
  # Now we need to specify which embeddings have to be computed. In this example, we create one vector for each text
  embedding_levels=[nlp_pb2.EmbeddingLevel.EMBEDDING_LEVEL_DOCUMENT],
  # The only thing missing now is the spacy configuration we created in the previous step.
  config=config
)

# Having created the request, we can now send it to the server and retrieve the corresponding response.
response = client.Vectors(request)

# Due to technical constraints, we cannot directly transfer numpy arrays, thus we convert our response.
vectors = [np.array(entry.document.vector) for entry in response.vectors]

Advanced Usage

A central piece for all available function is the NlpConfig message, allowing you to create even complex embedding models easily. In addition to its documentation, we will in the following present some examples to demonstrate the possibilities you have.

from arg_services.nlp.v1 import nlp_pb2

# In the example above, we already introduced a quite basic config:
config = nlp_pb2.NlpConfig(
  # You have to provide a language for every config: https://spacy.io/usage/models#languages
  language="en",
  # Also, you need to specify the model that spacy should load: https://spacy.io/models/en
  spacy_model="en_core_web_lg",
)

# A central feature of our library is the possibility to combine multiple embedding models, potentially capturing more contextual information.
config = nlp_pb2.NlpConfig(
  language="en",
  # This parameter expects a list of models. If you pass more than one, the respective vectors are **concatenated** to each other
  # (e.g., two 300-dimensional embeddings will result in a 600-dimensional one).
  # This approach is based on https://arxiv.org/abs/1803.01400
  embedding_models=[
    nlp_pb2.EmbeddingModel(
      # First select the type of model you would like to use (e.g., SBERT/Sentence Transformers).
      model_type=nlp_pb2.EmbeddingType.EMBEDDING_TYPE_SENTENCE_TRANSFORMERS,
      # Then select the actual model.
      # Any of those specified on the website (https://www.sbert.net/docs/pretrained_models.html) are allowed.
      model_name="all-mpnet-base-v2"
    ),
    nlp_pb2.EmbeddingModel(
      # It is also possible to use a standard spacy model
      model_type=nlp_pb2.EmbeddingType.EMBEDDING_TYPE_SPACY,
      model_name="en_core_web_lg",
      # Since we have selected a word embedding (i.e., it cannot directly encode sentences), the token vectors need to be aggregated somehow.
      # The default strategy is to use the arithmetic mean, but you are free to use other strategies (e.g., the geometric mean).
      pooling_type=nlp_pb2.Pooling.POOLING_GMEAN
    ),
    nlp_pb2.EmbeddingModel(
      model_type=nlp_pb2.EmbeddingType.EMBEDDING_TYPE_SPACY,
      model_name="en_core_web_lg",
      # Alternatively, it is also possible to use the generalized mean / power mean.
      # In this example, the selected pmean corresponds to the geometic mean (thus this embedding is identical to the previous one).
      # This approach is based on https://arxiv.org/abs/1803.01400
      pmean=0
    )
  ]
  # This setting is now optional and only needed if you need spacy features (e.g., POS tagging) besides embeddings.
  # spacy_model="en_core_web_lg",
)

# If computing the similarity between strings, you get one additional parameter.
config = nlp_pb2.NlpConfig(
  language="en",
  # To keep the example simple, we will now only use a single spacy model instead of the more powerful embedding models.
  # However, it is of course possible to use them here as well.
  spacy_model="en_core_web_lg",
  # If not specified, we will always use the cosine similarity when comparing two strings.
  # As indicated in a recent paper (https://arxiv.org/abs/1904.13264), you may achieve better results with alternative approaches like DynaMax Jaccard.
  # Please note that this particular method ignores your selected pooling method due to the fact that even plain word embeddings are not pooled at all.
  similarity_method=nlp_pb2.SimilarityMethod.SIMILARITY_METHOD_DYNAMAX_JACCARD
)

# It is also possible to determine a similarity score without the use of embeddings.
config = nlp_pb2.NlpConfig(
  language="en",
  spacy_model="en_core_web_lg",
  # Traditional metric (Jaccard similarity and Levenshtein edit distance) are also available
  similarity_method=nlp_pb2.SimilarityMethod.SIMILARITY_METHOD_EDIT
)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.2

May 20, 2025

2.0.1

Mar 3, 2025

2.0.0

Feb 6, 2025

2.0.0b2 pre-release

May 28, 2024

2.0.0b1 pre-release

Dec 4, 2023

1.4.10

Nov 7, 2023

1.4.9

Jul 4, 2023

1.4.8

Jul 4, 2023

1.4.7

Jul 3, 2023

1.4.6

Jul 2, 2023

1.4.4

Jun 29, 2023

1.4.3

Jun 7, 2023

1.4.2

Jun 7, 2023

1.4.1

May 23, 2023

1.4.0

Apr 10, 2023

1.3.11

Mar 26, 2023

1.3.10

Mar 20, 2023

1.3.9

Mar 19, 2023

1.3.8

Mar 19, 2023

1.3.7

Mar 18, 2023

1.3.6

Mar 18, 2023

1.3.5

Mar 6, 2023

1.3.4

Mar 6, 2023

1.3.3

Mar 3, 2023

1.3.2

Feb 22, 2023

1.3.1

Feb 3, 2023

1.3.0

Feb 3, 2023

1.2.2

Jan 16, 2023

1.2.1

Jan 11, 2023

1.2.0

Jan 10, 2023

This version

1.1.1

Jan 6, 2023

1.1.0

Jan 5, 2023

1.0.2

Dec 20, 2022

1.0.1

Dec 20, 2022

1.0.0

Dec 19, 2022

0.3.5

Dec 19, 2022

0.3.4

Dec 13, 2022

0.3.3

Dec 13, 2022

0.3.2

Sep 16, 2022

0.3.1

Jun 13, 2022

0.3.0

Jun 2, 2022

0.2.1

Feb 20, 2022

0.2.0

Feb 10, 2022

0.1.5

Feb 10, 2022

0.1.4

Feb 8, 2022

0.1.3

Feb 8, 2022

0.1.2

Feb 7, 2022

0.1.1

Feb 6, 2022

0.1.0

May 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp_service-1.1.1.tar.gz (21.1 kB view details)

Uploaded Jan 6, 2023 Source

Built Distribution

nlp_service-1.1.1-py3-none-any.whl (19.9 kB view details)

Uploaded Jan 6, 2023 Python 3

File details

Details for the file nlp_service-1.1.1.tar.gz.

File metadata

Download URL: nlp_service-1.1.1.tar.gz
Upload date: Jan 6, 2023
Size: 21.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.1 CPython/3.9.16 Linux/5.15.0-1024-azure

File hashes

Hashes for nlp_service-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`24a59fe88a4721c6c5769e9200c8a74b2ecb2cbe2df8cf73c08ecea17f1005a5`
MD5	`d7b66f6271193342c335abdc59a8e0b2`
BLAKE2b-256	`5ad8f7b93b2008e6daee95b9c2372017930fc0442b6a0bdef80d687619708caa`

See more details on using hashes here.

File details

Details for the file nlp_service-1.1.1-py3-none-any.whl.

File metadata

Download URL: nlp_service-1.1.1-py3-none-any.whl
Upload date: Jan 6, 2023
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.1 CPython/3.9.16 Linux/5.15.0-1024-azure

File hashes

Hashes for nlp_service-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f53a9eeaa6b4c24416e998e0658f964b30db43691cafdb3205fccd5c30c1579c`
MD5	`af2217ba699a1dc36d932c3e00385171`
BLAKE2b-256	`8b563919b5e377b0963cfe5b4be02cfe26ffbdc8e34a7a37743ebffb30fece52`

See more details on using hashes here.

nlp-service 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NLP Microservice

Installation and Setup

Poetry

Docker

General Usage

Advanced Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes