Skip to main content

Check if Unicode text files are Unicode-normalized

Project description

Unicodecheck

Simple tool to check if Unicode text files are Unicode-normalized

Install

pip3 install unicodecheck

Usage

Quickstart

unicodecheck -iv SPAM.txt

To check files in a directory recursively:

unicodecheck -ivr Ham/Eggs/

Synopsis

The main program can be invoked either through the unicodecheck command or through the Python main module option python3 -m unicodecheck.

usage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]
                    PATH [PATH ...]

Options

positional arguments:
  PATH                  describe input file or directory (pass '-' to specify stdin)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}
                        target Unicode normalization (default: NFC)
  -d, --diff            show diffs between the original and normalized (default: False)
  -u [NUMBER], -U [NUMBER], --unified [NUMBER]
                        show unified diffs with NUMBER lines of context [NUMBER=3] (default: False)
  -r, --recursive       follow the directory tree rooted in each PATH argument (default: False)
  -i, --include-hidden  include hidden files and directories (default: False)
  -b PATTERN [PATTERN ...], --blacklist PATTERN [PATTERN ...]
                        notify if having PATTERN (case-sensitive) (default: None)
  -e, --error           return non-zero exit code on detection (default: False)
  -v, --verbose         report non-essential logs (default: False)

Tips

Check whether filenames are normalized

The convmv command is a good alternative to using this application.

NFC

convmv -f utf8 -t utf8 --nfc -r ./

NFD

convmv -f utf8 -t utf8 --nfd -r ./

Notes

  • This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.
  • The procedure for determining the binary file refers to Git's algorithm.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicodecheck-1.2.3.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

unicodecheck-1.2.3-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file unicodecheck-1.2.3.tar.gz.

File metadata

  • Download URL: unicodecheck-1.2.3.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.7

File hashes

Hashes for unicodecheck-1.2.3.tar.gz
Algorithm Hash digest
SHA256 ea5eee39b6f8bab9fb07d920ee0f1116c718ce01fa2346c7f7f5a798c4c32cba
MD5 25c042bcbf8aa63868a35017750f1ed6
BLAKE2b-256 5318b5ee445a2f9ba8f213afb848895876363af3d4224f4c32959c78bc6d5aba

See more details on using hashes here.

File details

Details for the file unicodecheck-1.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for unicodecheck-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 72788103a423856fe45f8f1508d1c86bc3809d279ec6b0357d45cffb4ca0fa55
MD5 baaf4790865e092737ba15f9ee090d95
BLAKE2b-256 4f586afc456298dcd3f312031cbd3c5807282b082ce2d1d5e522d10200cceef2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page