Skip to main content

A simple tool to check if Unicode text files are Unicode-normalized

Project description

Unicodecheck

A simple tool to check if Unicode text files are Unicode-normalized

Install

pip install unicodecheck

Usage

Quickstart

unicodecheck -iv SPAM.txt

To check files in a directory recursively:

unicodecheck -ivr Ham/Eggs/

Synopsis

The main program can be invoked either through the unicodecheck command or through the Python main module option python3 -m unicodecheck.

usage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]
                    PATH [PATH ...]

Options

positional arguments:
  PATH                  specify an input file or directory (pass '-' to specify stdin)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}
                        target Unicode normalization (default: NFC)
  -d, --diff            show diffs between the original and normalized (default: False)
  -u [NUMBER], -U [NUMBER], --unified [NUMBER]
                        show unified diffs with NUMBER lines of context [NUMBER=3] (default: False)
  -r, --recursive       follow the directory tree rooted in each PATH argument (default: False)
  -i, --include-hidden  include hidden files and directories (default: False)
  -b PATTERN [PATTERN ...], --blacklist PATTERN [PATTERN ...]
                        notify if it contains PATTERN (case-sensitive) (default: None)
  -e, --error           return non-zero exit code on detection (default: False)
  -v, --verbose         report non-essential logs (default: False)

Tips

Check whether filenames are normalized

The convmv command is a good alternative to using this application.

NFC

convmv -f utf8 -t utf8 --nfc -r ./

NFD

convmv -f utf8 -t utf8 --nfd -r ./

Notes

  • This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.
  • The procedure for determining binary files is based on Git's algorithm.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicodecheck-1.2.6.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unicodecheck-1.2.6-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file unicodecheck-1.2.6.tar.gz.

File metadata

  • Download URL: unicodecheck-1.2.6.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for unicodecheck-1.2.6.tar.gz
Algorithm Hash digest
SHA256 046ec131bf4c395ecdc2637644286611788b7c7f91b71779c13633dba3b9eb5f
MD5 ed35430ba3cd7e0cf50a4646ad46b738
BLAKE2b-256 01d16b540389024f7d891629d9f3b0ac29b718b256ae8dd3e37ba35820aea08e

See more details on using hashes here.

File details

Details for the file unicodecheck-1.2.6-py3-none-any.whl.

File metadata

  • Download URL: unicodecheck-1.2.6-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for unicodecheck-1.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d83c0b3ffe1a6c7d2a5aec2e5a99212676df91daee5ce637d1bee0ae052b1f85
MD5 bfd9c77a5981a220e071ce5d9ca966af
BLAKE2b-256 f27efde094197e8918abbbab9ac92915d2b7f213ea789320a1f4b25ff8fc4f93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page