Skip to main content

Check if Unicode text files are Unicode-normalized

Project description

Unicodecheck

Simple tool to check if Unicode text files are Unicode-normalized

Install

pip3 install unicodecheck

Usage

Quickstart

unicodecheck -iv SPAM.txt

To check files in a directory recursively:

unicodecheck -ivr Ham/Eggs/

Synopsis

The main program can be invoked either through the unicodecheck command or through the Python main module option python3 -m unicodecheck.

usage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]
                    PATH [PATH ...]

Options

positional arguments:
  PATH                  describe input file or directory (pass '-' to specify stdin)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}
                        target Unicode normalization (default: NFC)
  -d, --diff            show diffs between the original and normalized (default: False)
  -u [NUMBER], -U [NUMBER], --unified [NUMBER]
                        show unified diffs with NUMBER lines of context [NUMBER=3] (default: False)
  -r, --recursive       follow the directory tree rooted in each PATH argument (default: False)
  -i, --include-hidden  include hidden files and directories (default: False)
  -b PATTERN [PATTERN ...], --blacklist PATTERN [PATTERN ...]
                        notify if having PATTERN (case-sensitive) (default: None)
  -e, --error           return non-zero exit code on detection (default: False)
  -v, --verbose         report non-essential logs (default: False)

Tips

Check whether filenames are normalized

The convmv command is a good alternative to using this application.

NFC

convmv -f utf8 -t utf8 --nfc -r ./

NFD

convmv -f utf8 -t utf8 --nfd -r ./

Notes

  • This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.
  • The procedure for determining the binary file refers to Git's algorithm.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicodecheck-1.2.2.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

unicodecheck-1.2.2-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file unicodecheck-1.2.2.tar.gz.

File metadata

  • Download URL: unicodecheck-1.2.2.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for unicodecheck-1.2.2.tar.gz
Algorithm Hash digest
SHA256 e4a5eed5a8705ae407dbefe684befb70dbe25d35833e008a2c2e54bb8a7d3c43
MD5 719b2d1a4c1258c1e1ab1bc95ada39ec
BLAKE2b-256 15b1a9c4c8e8e8aec74a986240edb2697581347406070c6578011ccc9d2fb6ff

See more details on using hashes here.

File details

Details for the file unicodecheck-1.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for unicodecheck-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f0cd2eba2807565e4facfa71543f53f97e6df17b4e051deb983c2fabfa0eeedd
MD5 2cf0330a131e43f881d3f931b5c52b99
BLAKE2b-256 082515f2b91b39032b2a006257bef0ebcdea0a4cddf886541b1f67d235196ade

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page