Skip to main content

Fix common misspellings in text files

Project description

Fix common misspellings in text files. It’s designed primarily for checking misspelled words in source code (backslash escapes are skipped), but it can be used with other files as well. It does not check for word membership in a complete dictionary, but instead looks for a set of common misspellings. Therefore it should catch errors like “adn”, but it will not catch “adnasdfasdf”. This also means it shouldn’t generate false-positives when you use a niche term it doesn’t know about.

Requirements

Python 3.8 or above.

Installation

You can use pip to install codetypo with e.g.:

pip install codetypo

Usage

Below are some simple usage examples to demonstrate how the tool works. For exhaustive usage information, please check the output of codetypo -h.

Run codetypo in all files of the current directory:

codetypo

Run codetypo in specific files or directories (specified via their names or glob patterns):

codetypo some_file some_dir/ *.ext

Some noteworthy flags:

codetypo -w, --write-changes

The -w flag will actually implement the changes recommended by codetypo. Running without the -w flag is the same as doing a dry run. It is recommended to run this with the -i or --interactive flag.

codetypo -I FILE, --ignore-words=FILE

The -I flag can be used for a list of certain words to allow that are in the codetypo dictionaries. The format of the file is one word per line. Invoke using: codetypo -I path/to/file.txt to execute codetypo referencing said list of allowed words. See Ignoring Words for more details.

codetypo -L word1,word2,word3,word4

The -L flag can be used to allow certain words that are comma-separated placed immediately after it. See Ignoring Words for more details.

codetypo -x FILE, --exclude-file=FILE

Ignore whole lines that match those in FILE. The lines in FILE should match the to-be-excluded lines exactly.

codetypo -S, --skip=

Comma-separated list of files to skip. It accepts globs as well. Examples:

  • to skip .eps & .txt files, invoke codetypo --skip="*.eps,*.txt"

  • to skip directories, invoke codetypo --skip="./src/3rd-Party,./src/Test"

Useful commands:

codetypo -d -q 3 --skip="*.po,*.ts,./src/3rdParty,./src/Test"

List all typos found except translation files and some directories. Display them without terminal colors and with a quiet level of 3.

codetypo -i 3 -w

Run interactive mode level 3 and write changes to file.

We ship a collection of dictionaries that are an improved version of the one available on Wikipedia after applying them in projects like Linux Kernel, EFL, oFono among others. You can provide your own version of the dictionary, but patches for new/different entries are very welcome.

Want to know if a word you’re proposing exists in codetypo already? It is possible to test a word against the current set dictionaries that exist in codetypo/data/dictionary*.txt via:

echo "word" | codetypo -
echo "1stword,2ndword" | codetypo -

You can select the optional dictionaries with the --builtin option.

Ignoring words

When ignoring false positives, note that spelling errors are case-insensitive but words to ignore are case-sensitive. For example, the dictionary entry wrod will also match the typo Wrod, but to ignore it you must pass wrod.

The words to ignore can be passed in two ways:

  1. -I: A file with a word per line to ignore:

    codetypo -I FILE, --ignore-words=FILE
  2. -L: A comma separated list of words to ignore on the command line:

    codetypo -L word1,word2,word3,word4

Inline ignore

Some situation might require ignoring a specific word in a specific location. This can be achieved by adding a comment in the source code. You can either ignore a single word or a list of words. The comment should be in the format of codetypo:ignore <words>. Words should be separated by a comma.

  1. ignore specific word:

    def wrod() # codetypo:ignore wrod
        pass
  2. ignore multiple words:

    def wrod(wrods) # codetypo:ignore
        pass

Using a config file

Command line options can also be specified in a config file.

When running codetypo, it will check in the current directory for a file named setup.cfg or .codetyporc (or a file specified via --config), containing an entry named [codetypo]. Each command line argument can be specified in this file (without the preceding dashes), for example:

[codetypo]
skip = *.po,*.ts,./src/3rdParty,./src/Test
count =
quiet-level = 3

The .codetyporc file is an INI file, which is read using Python’s configparser. For example, comments are possible using ; or # as the first character.

Values in an INI file entry cannot start with a - character, so if you need to do this, structure your entries like this:

[codetypo]
dictionary = mydict,-
ignore-words = bar,-foo

instead of these invalid entries:

[codetypo]
dictionary = -,mydict
ignore-words = -foo,bar

Codetypo will also check in the current directory for a pyproject.toml (or a path can be specified via --toml <filename>) file, and the [tool.codetypo] entry will be used, but only if the tomli package is installed for versions of Python prior to 3.11. For example:

[tool.codetypo]
skip = '*.po,*.ts,./src/3rdParty,./src/Test'
count = true
quiet-level = 3

These are both equivalent to running:

codetypo --quiet-level 3 --count --skip "*.po,*.ts,./src/3rdParty,./src/Test"

If several config files are present, they are read in the following order:

  1. pyproject.toml (only if the tomli library is available)

  2. setup.cfg

  3. .codetyporc

  4. any additional file supplied via --config

If a codetypo configuration is supplied in several of these files, the configuration from the most recently read file overwrites previously specified configurations.

Any options specified in the command line will override options from the config files.

pre-commit hook

codetypo also works with pre-commit, using

- repo: https://github.com/khulnasoft/codetypo
  rev: v2.2.4
  hooks:
  - id: codetypo

If one configures codetypo using the pyproject.toml file instead use:

- repo: https://github.com/khulnasoft/codetypo
  rev: v2.2.4
  hooks:
  - id: codetypo
    additional_dependencies:
      - tomli

Dictionary format

The format of the dictionaries was influenced by the one they originally came from, i.e. from Wikipedia. The difference is how multiple options are treated and that the last argument is an optional reason why a certain entry could not be applied directly, but should instead be manually inspected. E.g.:

  1. Simple entry: one wrong word / one suggestion:

    calulated->calculated
  2. Entry with more than one suggested fix:

    fiel->feel, field, file, phial,

    Note the last comma! You need to use it, otherwise the last suggestion will be discarded (see below for why). When there is more than one suggestion, an automatic fix is not possible and the best we can do is to give the user the file and line where the error occurred as well as the suggestions.

  3. Entry with one word, but with automatic fix disabled:

    clas->class, disabled because of name clash in c++

    Note that there isn’t a comma at the end of the line. The last argument is treated as the reason why a suggestion cannot be automatically applied.

    There can also be multiple suggestions but any automatic fix will again be disabled:

    clas->class, clash, disabled because of name clash in c++

Development setup

As suggested in the Python Packaging User Guide, ensure pip, setuptools, and wheel are up to date before installing from source. Specifically you will need recent versions of setuptools and setuptools_scm:

pip install --upgrade pip setuptools setuptools_scm wheel

You can install required dependencies for development by running the following within a checkout of the codetypo source:

pip install -e ".[dev]"

To run tests against the codebase run:

make check

Sending pull requests

If you have a suggested typo that you’d like to see merged please follow these steps:

  1. Make sure you read the instructions mentioned in the Dictionary format section above to submit correctly formatted entries.

  2. Choose the correct dictionary file to add your typo to. See codetypo –help for explanations of the different dictionaries.

  3. Sort the dictionaries. This is done by invoking (in the top level directory of codetypo/):

    make check-dictionaries

    If the make script finds that you need to sort a dictionary, please then run:

    make sort-dictionaries
  4. Only after this process is complete do we recommend you submit the PR.

Important Notes:

  • If the dictionaries are submitted without being pre-sorted the PR will fail via our various CI tools.

  • Not all PRs will be merged. This is pending on the discretion of the devs, maintainers, and the community.

Updating

To stay current with codetypo developments it is possible to build codetypo from GitHub via:

pip install --upgrade git+https://github.com/khulnasoft/codetypo.git

Important Notes:

  • Sometimes installing via pip will complain about permissions. If this is the case then run with:

    pip install --user --upgrade git+https://github.com/khulnasoft/codetypo.git
  • It has been reported that after installing from pip, codetypo can’t be located. Please check the $PATH variable to see if ~/.local/bin is present. If it isn’t then add it to your path.

  • If you decide to install via pip then be sure to remove any previously installed versions of codetypo (via your platform’s preferred app manager).

Updating the dictionaries

In the scenario where the user prefers not to follow the development version of codetypo yet still opts to benefit from the frequently updated dictionary files, we recommend running a simple set of commands to achieve this:

wget https://raw.githubusercontent.com/khulnasoft/codetypo/master/codetypo/data/dictionary.txt
codetypo -D dictionary.txt

The above simply downloads the latest dictionary.txt file and then by utilizing the -D flag allows the user to specify the freshly downloaded dictionary.txt as the custom dictionary instead of the default one.

You can also do the same thing for the other dictionaries listed here:

https://github.com/khulnasoft/codetypo/tree/master/codetypo/data

License

The Python script codetypo with its library codetypo is available with the following terms: (tl;dr: GPL v2)

Copyright (C) 2010-2011 KhulnaSoft DevOps <support@khulnasoft.com>

Copyright (C) 2011 ProFUSION embedded systems

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, see <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>.

dictionary.txt and the other dictionary_*.txt files are derivative works of English Wikipedia and are released under the Creative Commons Attribution-Share-Alike License 3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codetypo-0.2.0.tar.gz (352.7 kB view details)

Uploaded Source

Built Distribution

codetypo-0.2.0-py3-none-any.whl (343.0 kB view details)

Uploaded Python 3

File details

Details for the file codetypo-0.2.0.tar.gz.

File metadata

  • Download URL: codetypo-0.2.0.tar.gz
  • Upload date:
  • Size: 352.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.18

File hashes

Hashes for codetypo-0.2.0.tar.gz
Algorithm Hash digest
SHA256 14d561832a46bc92a7e9b6cb24d91fb9230835d7d354b88d9dc15b195ba60f94
MD5 d62d5cc52930c4640823b7dcf41268f2
BLAKE2b-256 72d2f6cd76ff0ad1de028814942832c4b68680618b8052e853d1ac4d5057cce1

See more details on using hashes here.

File details

Details for the file codetypo-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: codetypo-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 343.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.18

File hashes

Hashes for codetypo-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2adc66bb1dd23ff1a09a88454227862d3891ed707347a25ac628e16a7badcee
MD5 0f5747c4032f95569fde2d44100656cb
BLAKE2b-256 433432e343155f5ea34eb3d7c6f5893eba271b064b499de001ecc3f93997322b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page