Skip to main content

Check if Unicode text files are Unicode-normalized

Project description

Unicodecheck

Simple tool to check if Unicode text files are Unicode-normalized

Install

pip3 install unicodecheck

Usage

Quickstart

unicodecheck -iv SPAM.txt

To check files in a directory recursively:

unicodecheck -ivr Ham/Eggs/

Synopsis

The main program can be invoked either through the unicodecheck command or through the Python main module option python3 -m unicodecheck.

usage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]
                    PATH [PATH ...]

positional arguments:
  PATH                  describe input file or directory (pass '-' to specify stdin)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}
                        target Unicode normalization (default: NFC)
  -d, --diff            show diffs between the original and normalized (default: False)
  -u [NUMBER], -U [NUMBER], --unified [NUMBER]
                        show unified diffs with NUMBER lines of context [NUMBER=3] (default: False)
  -r, --recursive       follow the directory tree rooted in each PATH argument (default: False)
  -i, --include-hidden  include hidden files and directories (default: False)
  -b PATTERN [PATTERN ...], --blacklist PATTERN [PATTERN ...]
                        notify if having PATTERN (default: None)
  -v, --verbose         report non-essential logs (default: False)

Tips

Check whether filenames are normalized

convmv command is suitable.

NFC

convmv -f utf8 -t utf8 --nfc -r ./

NFD

convmv -f utf8 -t utf8 --nfd -r ./

Notes

  • This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.
  • The procedure for determining the binary file refers to Git's algorithm.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicodecheck-1.1.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

unicodecheck-1.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file unicodecheck-1.1.0.tar.gz.

File metadata

  • Download URL: unicodecheck-1.1.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for unicodecheck-1.1.0.tar.gz
Algorithm Hash digest
SHA256 4937d864e128b28b4c9c093232bc9d375ad60b8a1b08356b697ee85490b1f5e7
MD5 4919862ee2f60057c16176361eac8b2b
BLAKE2b-256 dcf19283e72f97dc568d007552926ddfc688735501d5e5c9573a167adcb7df37

See more details on using hashes here.

File details

Details for the file unicodecheck-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for unicodecheck-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41c67c9ba2eb096c03ab1e579cf465e6d60981980f849783ba2efeef0711e01f
MD5 531eda352c92b0a21658693ffe7a6d37
BLAKE2b-256 5d799e0de239f73617b9ad26a0c62bd4ce92eee6eef0dfd4df56a2580a184c0a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page