Skip to main content

Check if Unicode text files are Unicode-normalized

Project description

Unicodecheck

Simple tool to check if Unicode text files are Unicode-normalized

Install

pip3 install unicodecheck

Usage

Quickstart

unicodecheck -iv SPAM.txt

To check files in a directory recursively:

unicodecheck -ivr Ham/Eggs/

Synopsis

The main program can be invoked either through the unicodecheck command or through the Python main module option python3 -m unicodecheck.

usage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]
                    PATH [PATH ...]

positional arguments:
  PATH                  describe input file or directory (pass '-' to specify stdin)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}
                        target Unicode normalization (default: NFC)
  -d, --diff            show diffs between the original and normalized (default: False)
  -u [NUMBER], -U [NUMBER], --unified [NUMBER]
                        show unified diffs with NUMBER lines of context [NUMBER=3] (default: False)
  -r, --recursive       follow the directory tree rooted in each PATH argument (default: False)
  -i, --include-hidden  include hidden files and directories (default: False)
  -v, --verbose         report non-essential logs (default: False)

Tips

Check whether filenames are normalized

convmv command is suitable.

NFC

convmv -f utf8 -t utf8 --nfc -r ./

NFD

convmv -f utf8 -t utf8 --nfd -r ./

Notes

  • This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.
  • The procedure for determining the binary file refers to Git's algorithm.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicodecheck-1.0.1.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

unicodecheck-1.0.1-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file unicodecheck-1.0.1.tar.gz.

File metadata

  • Download URL: unicodecheck-1.0.1.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for unicodecheck-1.0.1.tar.gz
Algorithm Hash digest
SHA256 af6fa8aed668343fe826c50351367ec71841feca0d020efb3badcb940dfb553c
MD5 93b894a047ec048d7e910f2cf17aeb96
BLAKE2b-256 36283a1ed475ec1b1972501ae26380fc7b5f0b2b59620538d71ceaf7a3f96bb1

See more details on using hashes here.

File details

Details for the file unicodecheck-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for unicodecheck-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ca34a57125957aee1c2aa94b302141ef9def237b8aa5a67ea10a4dfb1debfa91
MD5 3b357537c93a9a61b6169b1f7b428509
BLAKE2b-256 5d5f59956298f63089c383c104288996dd0e030e0ec768b7bc762ef49223a05b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page