A unicode and character set explorer.
Project description
charex is a Unicode and character set explorer for understanding issues with character set translation and Unicode normalization.
Why Did I Make This?
I find the ambiguity of text data interesting. In memory it’s all ones and zeros. There is nothing inherent to the data that makes 0x20 mean a space character, but we’ve mostly agreed that it does. That “mostly” part is what’s interesting to me, and where a lot of fun problems lie.
How Do I Use This?
It’s in PyPI, so you can install it with pip, as long as you are using Python 3.11 or higher:
pip install charex
charex has four modes of operation:
Direct command line invocation,
An interactive shell,
A graphical user interface (GUI),
An application programming interface (API).
Command Line
To get help for direct invocation from the command line:
$ charex -h
Interactive Shell
To launch the interactive shell:
$ charex
That will bring you to the charex shell:
Welcome to the charex shell. Press ? for a list of comands. charex>
From here you can type ? to see the list of available commands:
Welcome to the charex shell. Press ? for a list of comands. charex> ? The following commands are available: * cd: Decode the given address in all codecs. * ce: Encode the given character in all codecs. * cl: List registered character sets. * clear: Clear the terminal. * ct: Count denormalization results. * dm: Build a denormalization map. * dn: Perform denormalizations. * dt: Display details for a code point. * el: List the registered escape schemes. * es: Escape a string using the given scheme. * fl: List registered normalization forms. * nl: Perform normalizations. * sh: Run in an interactive shell. For help on individual commands, use "help {command}". charex>
And then type help then a name of one of the commands to learn what it does:
charex> help dn usage: charex dn [-h] [-m MAXDEPTH] [-n NUMBER] [-r] [-s SEED] form base Denormalize a string. positional arguments: form The normalization form for the denormalization. Valid options are: casefold, nfc, nfd, nfkc, nfkd. base The base normalized string. options: -h, --help show this help message and exit -m MAXDEPTH, --maxdepth MAXDEPTH Maximum number of reverse normalizations to use for each character. -n NUMBER, --number NUMBER Maximum number of results to return. -r, --random Randomize the denormalization. -s SEED, --seed SEED Seed the randomized denormalization. charex>
GUI
To launch the charex GUI:
$ charex gui
API
To import charex into your Python script to get a summary of a Unicode character:
>>> import charex >>> >>> >>> value = 'a' >>> char = charex.Character(value) >>> print(char.summarize()) a U+0061 (LATIN SMALL LETTER A)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file charex-0.1.0.tar.gz
.
File metadata
- Download URL: charex-0.1.0.tar.gz
- Upload date:
- Size: 665.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 485c8cfcb87375919a651f57d750618145917229006734ffe8bb333a71ff78d2 |
|
MD5 | 27f411503ba3e3ead1c142727a7b4536 |
|
BLAKE2b-256 | 398b7a3a9681227a366537140eb6d81a6219cd3270f2c23637abff3387b65f12 |
File details
Details for the file charex-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: charex-0.1.0-py3-none-any.whl
- Upload date:
- Size: 711.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0906367112b556cc90627b54f4cb73e195a1dafe826e81abd65c4aa2ee73f1a |
|
MD5 | 92eccf53cf03105cd955eaace33de88a |
|
BLAKE2b-256 | 121959ed33c96d97841ef05cea67b9fad2a8d47b63949c3d307043906019c826 |