TextNormalizer perform fully supervised text normalization
Project description
TextNormalizer
TextNormalizer is a strings normalizer that uses SentenceTransformers as a backbone to obtain vector representations of sentences.
It is designed for repeated normalization of strings against a large corpus of strings.
The main contribution of TextNormalizer is to gain time by eliminating the need to compute the normalized strings embeddings every time.
Setup
pip install t-normalizer
Usage
- Create and instance of
TextNormalizer, can be initialized with aSentenceTransformermodel or aSentenceTransformermodel path. - Obtain the vector representation of the normalized string with
.fitmethod. - Transform the string with to the most similar normalized form using the
.transformmethod.
from textnormalizer import TextNormalizer
normalizer = TextNormalizer()
normalized_text = ['senior software engineer', 'solutions architect', 'junior software developer']
to_normalize = ['experienced software engineer', 'software architect', 'entry level software engineer']
normalizer.fit(normalized_text)
transformed = normalizer.transform(to_normalize)
The model along with the normalized strings and their vector representations can be saved and loaded with .save and .load methods.
Serialization
# save
normalizer.save('path/to/model')
# load
model = TextNormalizer.load('path/to/model')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file t-normalizer-0.0.1.tar.gz.
File metadata
- Download URL: t-normalizer-0.0.1.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fff94108bb9eccd18ce99e6f1299d6709beba7e9ecda164eb1bd6db6a21a9a8e
|
|
| MD5 |
61e99f41fbff244fa235cc09b1cfafde
|
|
| BLAKE2b-256 |
6208161a0998101ad350ec5039849eb91964d4a51f8b1f53c7a74a1a04820045
|
File details
Details for the file t_normalizer-0.0.1-py3-none-any.whl.
File metadata
- Download URL: t_normalizer-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a7615f259118012a51c1d497f50d6764037294add7c0647f77cdfc2620c7f65
|
|
| MD5 |
362a2d0d2fd375e5a74e5d865c8f20e1
|
|
| BLAKE2b-256 |
721157b28f1e56b17ec458c0804bf1e71b9d39bd6f88d5056a42793935c94a95
|