Skip to main content

Check if Unicode text files are Unicode-normalized

Project description

Unicodecheck

Simple tool to check if Unicode text files are Unicode-normalized

Install

pip install unicodecheck

Usage

Quickstart

unicodecheck -iv SPAM.txt

To check files in a directory recursively:

unicodecheck -ivr Ham/Eggs/

Synopsis

The main program can be invoked either through the unicodecheck command or through the Python main module option python3 -m unicodecheck.

usage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]
                    PATH [PATH ...]

Options

positional arguments:
  PATH                  describe input file or directory (pass '-' to specify stdin)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}
                        target Unicode normalization (default: NFC)
  -d, --diff            show diffs between the original and normalized (default: False)
  -u [NUMBER], -U [NUMBER], --unified [NUMBER]
                        show unified diffs with NUMBER lines of context [NUMBER=3] (default: False)
  -r, --recursive       follow the directory tree rooted in each PATH argument (default: False)
  -i, --include-hidden  include hidden files and directories (default: False)
  -b PATTERN [PATTERN ...], --blacklist PATTERN [PATTERN ...]
                        notify if having PATTERN (case-sensitive) (default: None)
  -e, --error           return non-zero exit code on detection (default: False)
  -v, --verbose         report non-essential logs (default: False)

Tips

Check whether filenames are normalized

The convmv command is a good alternative to using this application.

NFC

convmv -f utf8 -t utf8 --nfc -r ./

NFD

convmv -f utf8 -t utf8 --nfd -r ./

Notes

  • This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.
  • The procedure for determining the binary file refers to Git's algorithm.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicodecheck-1.2.5.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unicodecheck-1.2.5-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file unicodecheck-1.2.5.tar.gz.

File metadata

  • Download URL: unicodecheck-1.2.5.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for unicodecheck-1.2.5.tar.gz
Algorithm Hash digest
SHA256 a9e81f47112fc620cedd69bb83562217eac6646c7080099ed1cf576cb7924e0a
MD5 095864f00e02a69ef8e2f670f4324723
BLAKE2b-256 499f75ee421a2a0d8b422e198077dcc3d26987e0157e8a68db0aa46f96734fc4

See more details on using hashes here.

File details

Details for the file unicodecheck-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: unicodecheck-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for unicodecheck-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 232217165a3bdc786f7d1f23dd61b700fc1047bff14bb7ad9bb5331502d924be
MD5 625ad97710b4771d9bf666841f126e3f
BLAKE2b-256 0a6da98a82be000ae7b6ec3669cc8423ce502601c811c416fc0cfdb8251c7cc2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page