TextNormalizer perform fully supervised text normalization
Project description
TextNormalizer
TextNormalizer
is a strings normalizer that uses SentenceTransformers as a backbone to obtain vector representations of sentences.
It is designed for repeated normalization of strings against a large corpus of strings.
The main contribution of TextNormalizer
is to gain time by eliminating the need to compute the normalized strings embeddings every time.
Setup
pip install t-normalizer
Usage
- Create and instance of
TextNormalizer
, can be initialized with aSentenceTransformer
model or aSentenceTransformer
model path. - Obtain the vector representation of the normalized string with
.fit
method. - Transform the string with to the most similar normalized form using the
.transform
method.
from textnormalizer import TextNormalizer
normalizer = TextNormalizer()
normalized_text = ['senior software engineer', 'solutions architect', 'junior software developer']
to_normalize = ['experienced software engineer', 'software architect', 'entry level software engineer']
normalizer.fit(normalized_text)
transformed = normalizer.transform(to_normalize)
The model along with the normalized strings and their vector representations can be saved and loaded with .save
and .load
methods.
Serialization
# save
normalizer.save('path/to/model')
# load
model = TextNormalizer.load('path/to/model')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
t-normalizer-0.0.1.tar.gz
(5.0 kB
view hashes)
Built Distribution
Close
Hashes for t_normalizer-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a7615f259118012a51c1d497f50d6764037294add7c0647f77cdfc2620c7f65 |
|
MD5 | 362a2d0d2fd375e5a74e5d865c8f20e1 |
|
BLAKE2b-256 | 721157b28f1e56b17ec458c0804bf1e71b9d39bd6f88d5056a42793935c94a95 |