Skip to main content

A unicode and character set explorer.

Project description

charex is a Unicode and character set explorer for understanding issues with character set translation and Unicode normalization.

Why Did I Make This?

I find the ambiguity of text data interesting. In memory it’s all ones and zeros. There is nothing inherent to the data that makes 0x20 mean a space character, but we’ve mostly agreed that it does. That “mostly” part is what’s interesting to me, and where a lot of fun problems lie.

How Do I Use This?

It’s in PyPI, so you can install it with pip, as long as you are using Python 3.11 or higher:

pip install charex

charex has four modes of operation:

  • Direct command line invocation,

  • An interactive shell,

  • A graphical user interface (GUI),

  • An application programming interface (API).

Command Line

To get help for direct invocation from the command line:

$ charex -h

Interactive Shell

To launch the interactive shell:

$ charex

That will bring you to the charex shell:

Welcome to the charex shell.
Press ? for a list of comands.

charex>

From here you can type ? to see the list of available commands:

Welcome to the charex shell.
Press ? for a list of comands.

charex> ?
The following commands are available:

  * cd: Decode the given address in all codecs.
  * ce: Encode the given character in all codecs.
  * cl: List registered character sets.
  * clear: Clear the terminal.
  * ct: Count denormalization results.
  * dm: Build a denormalization map.
  * dn: Perform denormalizations.
  * dt: Display details for a code point.
  * el: List the registered escape schemes.
  * es: Escape a string using the given scheme.
  * fl: List registered normalization forms.
  * nl: Perform normalizations.
  * sh: Run in an interactive shell.

For help on individual commands, use "help {command}".

charex>

And then type help then a name of one of the commands to learn what it does:

charex> help dn
usage: charex dn [-h] [-m MAXDEPTH] [-n NUMBER] [-r] [-s SEED] form base

Denormalize a string.

positional arguments:
  form                  The normalization form for the denormalization. Valid
                        options are: casefold, nfc, nfd, nfkc, nfkd.
  base                  The base normalized string.

options:
  -h, --help            show this help message and exit
  -m MAXDEPTH, --maxdepth MAXDEPTH
                        Maximum number of reverse normalizations to use for
                        each character.
  -n NUMBER, --number NUMBER
                        Maximum number of results to return.
  -r, --random          Randomize the denormalization.
  -s SEED, --seed SEED  Seed the randomized denormalization.

charex>

GUI

To launch the charex GUI:

$ charex gui

API

To import charex into your Python script to get a summary of a Unicode character:

>>> import charex
>>>
>>>
>>> value = 'a'
>>> char = charex.Character(value)
>>> print(char.summarize())
a U+0061 (LATIN SMALL LETTER A)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charex-0.1.0.tar.gz (665.4 kB view details)

Uploaded Source

Built Distribution

charex-0.1.0-py3-none-any.whl (711.0 kB view details)

Uploaded Python 3

File details

Details for the file charex-0.1.0.tar.gz.

File metadata

  • Download URL: charex-0.1.0.tar.gz
  • Upload date:
  • Size: 665.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for charex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 485c8cfcb87375919a651f57d750618145917229006734ffe8bb333a71ff78d2
MD5 27f411503ba3e3ead1c142727a7b4536
BLAKE2b-256 398b7a3a9681227a366537140eb6d81a6219cd3270f2c23637abff3387b65f12

See more details on using hashes here.

File details

Details for the file charex-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: charex-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 711.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for charex-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0906367112b556cc90627b54f4cb73e195a1dafe826e81abd65c4aa2ee73f1a
MD5 92eccf53cf03105cd955eaace33de88a
BLAKE2b-256 121959ed33c96d97841ef05cea67b9fad2a8d47b63949c3d307043906019c826

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page