Skip to main content

TextNormalizer perform fully supervised text normalization

Project description

PyPI - Python License: MIT

TextNormalizer

TextNormalizer is a strings normalizer that uses SentenceTransformers as a backbone to obtain vector representations of sentences. It is designed for repeated normalization of strings against a large corpus of strings.
The main contribution of TextNormalizer is to gain time by eliminating the need to compute the normalized strings embeddings every time.

Setup

pip install t-normalizer

Usage

  1. Create and instance of TextNormalizer, can be initialized with a SentenceTransformer model or a SentenceTransformer model path.
  2. Obtain the vector representation of the normalized string with .fit method.
  3. Transform the string with to the most similar normalized form using the .transform method.
from textnormalizer import TextNormalizer

normalizer = TextNormalizer()

normalized_text = ['senior software engineer', 'solutions architect', 'junior software developer']
to_normalize = ['experienced software engineer', 'software architect', 'entry level software engineer']

normalizer.fit(normalized_text)
transformed = normalizer.transform(to_normalize)

The model along with the normalized strings and their vector representations can be saved and loaded with .save and .load methods.

Serialization

# save
normalizer.save('path/to/model')

# load
model = TextNormalizer.load('path/to/model')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

t-normalizer-0.0.1.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

t_normalizer-0.0.1-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file t-normalizer-0.0.1.tar.gz.

File metadata

  • Download URL: t-normalizer-0.0.1.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.10

File hashes

Hashes for t-normalizer-0.0.1.tar.gz
Algorithm Hash digest
SHA256 fff94108bb9eccd18ce99e6f1299d6709beba7e9ecda164eb1bd6db6a21a9a8e
MD5 61e99f41fbff244fa235cc09b1cfafde
BLAKE2b-256 6208161a0998101ad350ec5039849eb91964d4a51f8b1f53c7a74a1a04820045

See more details on using hashes here.

File details

Details for the file t_normalizer-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for t_normalizer-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3a7615f259118012a51c1d497f50d6764037294add7c0647f77cdfc2620c7f65
MD5 362a2d0d2fd375e5a74e5d865c8f20e1
BLAKE2b-256 721157b28f1e56b17ec458c0804bf1e71b9d39bd6f88d5056a42793935c94a95

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page