TextNormalizer perform fully supervised text normalization
Project description
TextNormalizer
TextNormalizer
is a strings normalizer that uses SentenceTransformers as a backbone to obtain vector representations of sentences.
It is designed for repeated normalization of strings against a large corpus of strings.
The main contribution of TextNormalizer
is to gain time by eliminating the need to compute the normalized strings embeddings every time.
Setup
pip install t-normalizer
Usage
- Create and instance of
TextNormalizer
, can be initialized with aSentenceTransformer
model or aSentenceTransformer
model path. - Obtain the vector representation of the normalized string with
.fit
method. - Transform the string with to the most similar normalized form using the
.transform
method.
from textnormalizer import TextNormalizer
normalizer = TextNormalizer()
normalized_text = ['senior software engineer', 'solutions architect', 'junior software developer']
to_normalize = ['experienced software engineer', 'software architect', 'entry level software engineer']
normalizer.fit(normalized_text)
transformed = normalizer.transform(to_normalize)
The model along with the normalized strings and their vector representations can be saved and loaded with .save
and .load
methods.
Serialization
# save
normalizer.save('path/to/model')
# load
model = TextNormalizer.load('path/to/model')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file t-normalizer-0.0.1.tar.gz
.
File metadata
- Download URL: t-normalizer-0.0.1.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fff94108bb9eccd18ce99e6f1299d6709beba7e9ecda164eb1bd6db6a21a9a8e |
|
MD5 | 61e99f41fbff244fa235cc09b1cfafde |
|
BLAKE2b-256 | 6208161a0998101ad350ec5039849eb91964d4a51f8b1f53c7a74a1a04820045 |
File details
Details for the file t_normalizer-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: t_normalizer-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a7615f259118012a51c1d497f50d6764037294add7c0647f77cdfc2620c7f65 |
|
MD5 | 362a2d0d2fd375e5a74e5d865c8f20e1 |
|
BLAKE2b-256 | 721157b28f1e56b17ec458c0804bf1e71b9d39bd6f88d5056a42793935c94a95 |