Skip to main content

TextNormalizer perform fully supervised text normalization

Project description

PyPI - Python License: MIT

TextNormalizer

TextNormalizer is a strings normalizer that uses SentenceTransformers as a backbone to obtain vector representations of sentences. It is designed for repeated normalization of strings against a large corpus of strings.
The main contribution of TextNormalizer is to gain time by eliminating the need to compute the normalized strings embeddings every time.

Setup

pip install t-normalizer

Usage

  1. Create and instance of TextNormalizer, can be initialized with a SentenceTransformer model or a SentenceTransformer model path.
  2. Obtain the vector representation of the normalized string with .fit method.
  3. Transform the string with to the most similar normalized form using the .transform method.
from textnormalizer import TextNormalizer

normalizer = TextNormalizer()

normalized_text = ['senior software engineer', 'solutions architect', 'junior software developer']
to_normalize = ['experienced software engineer', 'software architect', 'entry level software engineer']

normalizer.fit(normalized_text)
transformed = normalizer.transform(to_normalize)

The model along with the normalized strings and their vector representations can be saved and loaded with .save and .load methods.

Serialization

# save
normalizer.save('path/to/model')

# load
model = TextNormalizer.load('path/to/model')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

t-normalizer-0.0.1.tar.gz (5.0 kB view hashes)

Uploaded Source

Built Distribution

t_normalizer-0.0.1-py3-none-any.whl (6.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page