Skip to main content

Check if Unicode text files are Unicode-normalized

Project description

Unicodecheck

Simple tool to check if Unicode text files are Unicode-normalized

Install

pip3 install unicodecheck

Usage

The main program can be invoked either through the unicodecheck command or through the Python main module option python3 -m unicodecheck.

usage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]
                    PATH [PATH ...]

positional arguments:
  PATH                  describe input file or directory (pass '-' to specify stdin)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}
                        target Unicode normalization (default: NFC)
  -d, --diff            show diffs between the original and normalized (default: False)
  -u [NUMBER], -U [NUMBER], --unified [NUMBER]
                        use unified diff with NUMBER lines of context [NUMBER=3] (default: False)
  -r, --recursive       follow the directory tree rooted in each PATH argument (default: False)
  -i, --include-hidden  include hidden files and directories (default: False)
  -v, --verbose         report non-essential logs (default: False)

Tips

Check whether filenames are normalized

convmv command is suitable.

NFC

convmv -f utf8 -t utf8 --nfc -r ./

NFD

convmv -f utf8 -t utf8 --nfd -r ./

Notes

  • This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.
  • The procedure for determining the binary file refers to Git's algorithm.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicodecheck-0.3.0.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

unicodecheck-0.3.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file unicodecheck-0.3.0.tar.gz.

File metadata

  • Download URL: unicodecheck-0.3.0.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.0

File hashes

Hashes for unicodecheck-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c97cfcdd781a6d113f7adac5c001c80220e85239f2db532503991fce0a599cd7
MD5 b8fae5ac89eba67fd2e6159d5ac073a8
BLAKE2b-256 40e839f960fd30f373d9ac61c70dbcfb9bfac2705dffb84e4d6ec03d96bdfb59

See more details on using hashes here.

File details

Details for the file unicodecheck-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for unicodecheck-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8c65ff5ea84a46ff9ec31b137ae782e64a947c4877a804922609472cb837bbf
MD5 b9e3ca5f8bfcf8c03f00872feb4ffd4d
BLAKE2b-256 f5d2649f4c823401175f11b5958e29c54c0e001afd197ebb83980a44c1d96f24

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page