A simple tool to check if Unicode text files are Unicode-normalized
Project description
Unicodecheck
A simple tool to check if Unicode text files are Unicode-normalized
Install
pip install unicodecheck
Usage
Quickstart
unicodecheck -iv SPAM.txt
To check files in a directory recursively:
unicodecheck -ivr Ham/Eggs/
Synopsis
The main program can be invoked either through the unicodecheck command or through the Python main module option python3 -m unicodecheck.
usage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]
PATH [PATH ...]
Options
positional arguments:
PATH specify an input file or directory (pass '-' to specify stdin)
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}
target Unicode normalization (default: NFC)
-d, --diff show diffs between the original and normalized (default: False)
-u [NUMBER], -U [NUMBER], --unified [NUMBER]
show unified diffs with NUMBER lines of context [NUMBER=3] (default: False)
-r, --recursive follow the directory tree rooted in each PATH argument (default: False)
-i, --include-hidden include hidden files and directories (default: False)
-b PATTERN [PATTERN ...], --blacklist PATTERN [PATTERN ...]
notify if it contains PATTERN (case-sensitive) (default: None)
-e, --error return non-zero exit code on detection (default: False)
-v, --verbose report non-essential logs (default: False)
Tips
Check whether filenames are normalized
The convmv command is a good alternative to using this application.
NFC
convmv -f utf8 -t utf8 --nfc -r ./
NFD
convmv -f utf8 -t utf8 --nfd -r ./
Notes
- This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.
- The procedure for determining binary files is based on Git's algorithm.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unicodecheck-1.2.6.tar.gz.
File metadata
- Download URL: unicodecheck-1.2.6.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
046ec131bf4c395ecdc2637644286611788b7c7f91b71779c13633dba3b9eb5f
|
|
| MD5 |
ed35430ba3cd7e0cf50a4646ad46b738
|
|
| BLAKE2b-256 |
01d16b540389024f7d891629d9f3b0ac29b718b256ae8dd3e37ba35820aea08e
|
File details
Details for the file unicodecheck-1.2.6-py3-none-any.whl.
File metadata
- Download URL: unicodecheck-1.2.6-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d83c0b3ffe1a6c7d2a5aec2e5a99212676df91daee5ce637d1bee0ae052b1f85
|
|
| MD5 |
bfd9c77a5981a220e071ce5d9ca966af
|
|
| BLAKE2b-256 |
f27efde094197e8918abbbab9ac92915d2b7f213ea789320a1f4b25ff8fc4f93
|