Skip to main content

A script to automatically spell checks comments of a codebase.

Project description

CommentSpellCheck

python testing

The CommentSpellCheck (CSC) package provides a script that automatically spell checks the comments of a code base. It was originally developed to be run on the SimpleITK and ITK code bases.

Here is how it is typically run:

python comment_spell_check.py --exclude Ancillary $SIMPLEITK_SOURCE_DIR/Code

This command will recursively find all the '.h' files in a directory, extract the C/C++ comments from the code, and run a spell checker on them. The '--exclude' flag tells the script to ignore any file that has 'Ancillary' in its full path name. This flag will accept any regular expression.

In addition to pyenchant's English dictionary, we use the words in additional_dictionary.txt. These words are proper names and technical terms harvest by hand from the SimpleITK and ITK code bases.

If a word is not found in the dictionaries, we try two additional checks.

  1. If the word starts with some known prefix, the prefix is removed ...and the remaining word is checked against the dictionary. The prefixes ...used by default are 'sitk', 'itk', and 'vtk'. Additional ...prefixes can be specified with the '--prefix' command line argument.

  2. We attempt to split the word by capitalization and check each ...sub-word against the dictionary. This method is an attempt to detect ...camel-case words such as 'GetArrayFromImage', which would get split into ...'Get', 'Array', 'From', and 'Image'. Camel-case words are very commonly ...used for code elements.

The script can also process other file types. With the '--suffix' option, the following file types are available: Python (.py), C/C++ (.c/.cxx), Text (.txt), reStructuredText(.rst), Markdown (.md), Ruby (.ruby), and Java (.java). Note that reStructuredText files are treated as standard text. Consequentially, all markup keywords that are not actual words will need to be added to the additional/exception dictionary.

Dictionary notes

By default, on Linux and Mac systems, pyenchant uses GNU aspell as the underlying dictionary. The spell checking is case sensitive. While aspell allows arbitrary characters in a dictionary word, CSC may split up a word by non-alphanumeric characters. This split can occur if the word itself is not found in the dictionary.

If a dictionary word has non-alphanumeric characters, CSC prints a warning.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

comment_spell_check-0.2.3.tar.gz (72.9 kB view details)

Uploaded Source

Built Distribution

comment_spell_check-0.2.3-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file comment_spell_check-0.2.3.tar.gz.

File metadata

  • Download URL: comment_spell_check-0.2.3.tar.gz
  • Upload date:
  • Size: 72.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for comment_spell_check-0.2.3.tar.gz
Algorithm Hash digest
SHA256 25118809987a6677bc39a166ddb7bab080b35bb66cf2a04fb886d9c8be6974cf
MD5 afed5cdd4dab285a6c8626b89b1d7054
BLAKE2b-256 f64a28861466d275c73fdffadb61d1803c669edaa44c36c8b04098458fcf46cc

See more details on using hashes here.

File details

Details for the file comment_spell_check-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for comment_spell_check-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 49c2e01a4e3685adea63b5887429253d66990a345b6746dfa85f0cfa4905ebdf
MD5 6ebf1df78c3fa717160f64bbdf5be135
BLAKE2b-256 faf993f403361f65729f08e9c17b464011ee1dbfa9b5c5d933e3197ce6b42077

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page