Skip to main content

A unicode and character set explorer.

Project description

charex is a Unicode and character set explorer for understanding issues with character set translation and Unicode normalization.

Why Did I Make This?

I find the ambiguity of text data interesting. In memory its all ones and zeros. There is nothing inherent to the data that makes 0x20 mean a space character, but we’ve mostly agreed that it does. That “mostly” part is what’s interesting to me, and where a lot of fun problems lie.

How Do I Use This?

Right now, the best way to use it is to clone the repository. Then in the root of the repository, run charex as a module.:

python -m charex

That will bring you to the charex shell:

Welcome to the charex shell.
Press ? for a list of comands.

charex>

From here you can type ? to see the list of available commands:

Welcome to the charex shell.
Press ? for a list of comands.

charex> ?
The following commands are available:

* cd: Decode the given address in all codecs.
* ce: Encode the given character in all codecs.
* cl: List registered character sets.
* ct: Count denormalization results.
* dm: Build a denormalization map.
* dn: Perform denormalizations.
* dt: Display details for a code point.
* el: List the registered escape schemes.
* es: Escape a string using the given scheme.
* fl: List registered normalization forms.
* help: Display command list.
* nl: Perform normalizations.
* sh: Run in an interactive shell.

For help on individual commands, use "help {command}".

charex>

And then type help then a name of one of the commands to learn what it does:

charex> help dn
usage: charex dn [-h] [-m MAXDEPTH] [-n NUMBER] [-r] [-s SEED]
                 {nfc,nfd,nfkc,nfkd} base

Denormalize a string.

positional arguments:
  {nfc,nfd,nfkc,nfkd}   The Unicode normalization form for the
                        denormalization.
  base                  The base normalized string.

options:
  -h, --help            show this help message and exit
  -m MAXDEPTH, --maxdepth MAXDEPTH
                        Maximum number of reverse normalizations to use for
                        each character.
  -n NUMBER, --number NUMBER
                        Maximum number of results to return.
  -r, --random          Randomize the denormalization.
  -s SEED, --seed SEED  Seed the randomized denormalization.

charex>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charex-0.0.1.tar.gz (658.6 kB view details)

Uploaded Source

Built Distribution

charex-0.0.1-py3-none-any.whl (705.1 kB view details)

Uploaded Python 3

File details

Details for the file charex-0.0.1.tar.gz.

File metadata

  • Download URL: charex-0.0.1.tar.gz
  • Upload date:
  • Size: 658.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for charex-0.0.1.tar.gz
Algorithm Hash digest
SHA256 92b88adf2d31304d6890e58964835dae87932c2f9885e502b25a7d8f77056ce9
MD5 e3da7c3e24d4a2b5047a5a72f2150714
BLAKE2b-256 aaf4b8031b3fc9a32d4b5b6f32f0c3bd1e21c1778802c8d59441e31e7e017e81

See more details on using hashes here.

File details

Details for the file charex-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: charex-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 705.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for charex-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 92e5635d332f40553d4d6dee9a70cd5bb779fa7e9a804632f15de1cf03188502
MD5 b758dc18f907dae0e632a82944226613
BLAKE2b-256 df4eb0fa48115954c04a0622db20ae9a4e2c611cbfe571ce24a7e88ea22728cd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page