Skip to main content

Machine learning spell check package that combines word's context with characters similarity.

Project description

TakeSpellChecker

TakeSpellChecker is a package that checks the spelling of words in any language using machine learning. It corrects the misspelled word by combining the context of the surrounding words to predict a list of the probable words and finds the one with the highest character similarity. The solution uses word embedding to learn the context. So, it's required to pass the path of the word embedding file. Also supports optionally to pass a configuration file (if the file is in an Azure fileshare, in other words, if the parameter from_azure is true).

TakeSpellChecker.SpellCheck: create constructor

  • path: str
  • path is the full embedding path to your word embedding model. Optionally, you can also set from_azure as True and pass a configuration file path to path.
  • from_azure: boolean
  • from_azure is an optional parameter. If you need to automatically download an embedding model from azure file share, you need to set this parameter as True and pass a configuration file to path instead of an embedding file.

TakeSpellChecker.set_data: sets the data

  • data: list, series, dataframe or a string that represents the file path
  • data is the content that needs to be processed. It can be a list, series, string or dataframe.
  • content_column_name: str
  • content_column_name is an optional parameter. It's only required when the data's type is a Dataframe or a path to the text file. If the column name is not set, the set_data method uses the first column as content
  • file_sep: str
  • file_sep is an optional parameter. It's only required when the data's is a path to the text file. If the file separator is not set, the set_data uses ';'.
  • encoding: str
  • encoding is an optional parameter. It's only required when the data's is a path to the text file. If the file encoding is not set, the set_data uses 'utf-8'.

TakeSpellChecker.spell_check: checks the spelling of the data

  • window_limit: int
  • window_limit is an optional parameter. Used to determine how many words of the sentence will be used as context.
  • threshold: float
  • threshold is an optional parameter. Used to determine how permissive your spell checker will be.
  • save_result: boolean
  • save_result is an optional parameter. If save_result is True, a file (output_spell_check.csv) with the columns: Original, SpellChecked and Corrected will be created in the same directory. The last column is an boolean column indicating if any word in the sentence was corrected.
  • output_file_name: str
  • output_file_name is an optional parameter. If save_result is True and output_file_name is set, the file will output_spell_check.csv) with the columns: Original, SpellChecked and Corrected will be created in the same directory

config.yml

account_name: my_account_name
account_key: my_key
directory: my_directory_name
embedding_file: my_embedding_file_name
embedding_share: my_file_share_name

Installation

Use the package manager pip to install TakeSpellChecker

pip install TakeSpellChecker

Usage

import TakeSpellChecker as sc

spell_checker = sc.SpellCheck(path, from_azure = True)
spell_checker.set_data(data)
corrected_df = spell_checker.spell_check(window_limit = 5, threshold = 0.94, save_result = True)
print(corrected_df)

Author

Karina Tiemi Kato

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TakeSparkSpellChecker-0.0.6.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

TakeSparkSpellChecker-0.0.6-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file TakeSparkSpellChecker-0.0.6.tar.gz.

File metadata

  • Download URL: TakeSparkSpellChecker-0.0.6.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for TakeSparkSpellChecker-0.0.6.tar.gz
Algorithm Hash digest
SHA256 9e88eae6fa7f808e717feb6d21a0d12f7f97b6fd94f3851b8ffb04b3dabb369f
MD5 4fd7312e70753b01b5097818dd8eb7fd
BLAKE2b-256 5c422b543bcba6c930475bb12d07697117598959dbbc2b83cd9c74a23d765053

See more details on using hashes here.

File details

Details for the file TakeSparkSpellChecker-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: TakeSparkSpellChecker-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for TakeSparkSpellChecker-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 2b0863c538b98bafa3c34f0f14e0272282972a34168a261dfb99f0daf6b8f874
MD5 ce370d6478aaba790aa33b2e74ba7b46
BLAKE2b-256 d5ddc735c55aee34676fe45203184d71f4c1551715af8278df50741268570858

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page