This is a simple tool to correct portuguese misspells automatically.
Project description
Spell Corrector PT
Correct automatically words in Portuguese.
How to use
- Get word list (best to use domain-specific words to lower the computational costs)
- Train the Model (check out example-train.py)
- Specify the path to save the model to reuse afterward.
- Load the Model and correct the words (check out example.py)
How the model works (high level)
- Preprocess the dictionary removing accentuation and transform to lowercase
- Extract char n_grams from the dictionary
- Create a sparse matrix from the dictionary utilizing the Bag of Words strategy
- Create a sparse matrix from the word preprocessed
- Compare the two sparse matrices by cosine similarity
- Return the most similar word
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Close
Hashes for spell_corrector_pt-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 162aa2f80908c5e15d912f19aabe8736f3d00452576d30c9c99b0bfe8425b84d |
|
MD5 | 69558a7b455fd1c5d98303292b6f7421 |
|
BLAKE2b-256 | 905c0cb1c349320674b45189797330e3244dbfb3d8e4eaedb56feecf08fae4f3 |
Close
Hashes for spell_corrector_pt-0.0.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8584cdc3b7daff3a687394d818cc3ddb667cc59ecd375267efcaac2b6b553c7c |
|
MD5 | dd49bdf531b2304e5e153aa39f6a0c9f |
|
BLAKE2b-256 | 78af9ab36ac7d96a7d840d743098a977a2eb40028fd2b2971d40fa1d14c10a0c |