Skip to main content

Unlimited UTF-8!

Project description

UTF-8000

Unlimited UTF-8!

ASCII ⊆ UTF-8 ⊆ UTF-8000

  • UTF-8000 is the correct way to expand UTF-8 indefinitely.
  • This repository contains a Python implementation of UTF-8000.
  • See UTF-8000-Website for the full writeup.
  • UTF-8000 is in no way endorsed by or representative of the Unicode Consortium. This is a standalone project.

Installing

Available on PyPI as UTF-8000

Recommended install using pipx:

$ pipx install UTF-8000

provides

utf-8000(1)

with subcommands

utf-8000 info
utf-8000 encode
utf-8000 decode

TLDR Examples

Using utf-8000 info

utf-8000-info-example.png 11111111 10111111 10110011 10011110 10101011 10011011 10111011 10101111 ...

Color key:

  • Bright Magenta: start sequence bits 111...110
  • Bright Cyan: continuation byte prefix 10
  • Bright Green: mandatory content bits
  • Green: content bits

Observe the first byte completely filled with 1s from the start sequence bits. The start sequence bits continue on into the continuation bytes, the first also being completely filled. The third byte contains the end of the start sequence bits, that is the two 1s and the terminating 0. The five mandatory content bits are in bright green, and the rest of the content is in green. Notice that at least one of the 'mandatory content bits' is a 1 to avoid an overlong encoding.

Using utf-8000 encode

$ echo 'U+DEADBEEFBADF00D' | utf-8000 encode | hexdump -C
00000000  ff bc b7 aa b6 be bb bb  ab 9f 80 8d              |............|
0000000c

Using utf-8000 decode

Using the bytes from the encode example above

$ echo -ne '\xff\xbc\xb7\xaa\xb6\xbe\xbb\xbb\xab\x9f\x80\x8d' | utf-8000 decode
U+DEADBEEFBADF00D

Or another example

$ echo 'שלום' | utf-8000 decode
U+05E9
U+05DC
U+05D5
U+05DD
U+000A

Package Contents

  • encode.py
    • encode(x: int) -> bytes: Encode an unsigned integer in UTF-8000 and return the bytes
    • fancy_encode(x: int) -> tuple[UTF8000Byte]: Encode an unsigned integer in UTF-8000 and return 'fancy' UTF8000Bytes, useful for education and inspection.
  • decode.py
    • UTF8000IncrementalDecoder: An incremental decoder class that can be fed bytes, and can be iterated over, yielding UTF8000Ints when full byte sequences have been supplied and decoded.
  • UTF8000Byte.py
    • UTF8000Byte: a 'fancy' byte wrapper around UTF-8000 bytes that
    • Various constants and utility functions

See Also

The main UTF-8000 specification, including how to derive it, history, statistics, trivia, rejected alternatives etc is located at UTF-8000-Website, hosted at utf-8000.jb2170.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

utf_8000-2.0.0.tar.gz (51.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

utf_8000-2.0.0-py3-none-any.whl (41.2 kB view details)

Uploaded Python 3

File details

Details for the file utf_8000-2.0.0.tar.gz.

File metadata

  • Download URL: utf_8000-2.0.0.tar.gz
  • Upload date:
  • Size: 51.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for utf_8000-2.0.0.tar.gz
Algorithm Hash digest
SHA256 88d9b3b66e7426034f5ca70b60a87d4f46a3692afcbf04f25984aec21ac8d56c
MD5 257f181652c9904d0c70f3fa31a080d6
BLAKE2b-256 4c21e26bbac58a985c85cc93b55858ce2d2ab6c70974a365da6adcf74bae568e

See more details on using hashes here.

File details

Details for the file utf_8000-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: utf_8000-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 41.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for utf_8000-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 decff8b55e6bd2ef6eb96f721044c29bc3727fc8a04ee84076bae45782b88b91
MD5 451a86ba77de20d05dfcaad3ac65b0da
BLAKE2b-256 c35cfed2254f7130ca6182649b6310955f780361c519b7d15ed12b74cd88be5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page