Unlimited UTF-8!

Project description

UTF-8000

Unlimited UTF-8!

ASCII ⊆ UTF-8 ⊆ UTF-8000

UTF-8000 is the correct way to expand UTF-8 indefinitely.
This repository contains a Python implementation of UTF-8000.
See UTF-8000-Website for the full writeup.
UTF-8000 is in no way endorsed by or representative of the Unicode Consortium. This is a standalone project.

Installing

Available on PyPI as UTF-8000

Recommended install using pipx:

$ pipx install UTF-8000

provides

utf-8000(1)

with subcommands

utf-8000 info
utf-8000 encode
utf-8000 decode

TLDR Examples

Using `utf-8000 info`

utf-8000-info-example.png 11111111 10111111 10110011 10011110 10101011 10011011 10111011 10101111 ...

Color key:

Bright Cyan: self-synchronization prefix 0 or 11 or 10
Bright Magenta: start bits 111...110
Bright Green: mandatory content bits
Green: content bits

Observe that the first byte has a self-synchronization prefix of 11 and is otherwise completely filled with 1s from the start bits. The start bits continue on into the continuation bytes which have a self-synchronization prefix of 10. The second byte is also completely filled with start bits. The third byte contains the end of the start bits, that is the two 1s and the terminating 0. The five mandatory content bits are in bright green, and the rest of the content is in green. Notice that at least one of the 'mandatory content bits' is a 1 to avoid an overlong encoding.

Using `utf-8000 encode`

$ echo 'U+DEADBEEFBADF00D' | utf-8000 encode | hexdump -C
00000000  ff bc b7 aa b6 be bb bb  ab 9f 80 8d              |............|
0000000c

Using `utf-8000 decode`

Using the bytes from the encode example above

$ echo -ne '\xff\xbc\xb7\xaa\xb6\xbe\xbb\xbb\xab\x9f\x80\x8d' | utf-8000 decode
U+DEADBEEFBADF00D

Or another example

$ echo 'שלום' | utf-8000 decode
U+05E9
U+05DC
U+05D5
U+05DD
U+000A

Package Contents

encode.py
- encode(x: int) -> bytes: Encode an unsigned integer in UTF-8000 and return the bytes
- fancy_encode(x: int) -> tuple[UTF8000Byte]: Encode an unsigned integer in UTF-8000 and return 'fancy' UTF8000Bytes, useful for education and inspection.
decode.py
- UTF8000IncrementalDecoder: An incremental decoder class that can be fed bytes, and can be iterated over, yielding UTF8000Ints when full byte sequences have been supplied and decoded.
UTF8000Byte.py
- UTF8000Byte: a 'fancy' byte wrapper around UTF-8000 bytes that
- Various constants and utility functions

Project details

Release history Release notifications | RSS feed

This version

2.1.0

Jun 3, 2026

2.0.0

May 20, 2026

1.0.0

Nov 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

utf_8000-2.1.0.tar.gz (27.8 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

utf_8000-2.1.0-py3-none-any.whl (28.9 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file utf_8000-2.1.0.tar.gz.

File metadata

Download URL: utf_8000-2.1.0.tar.gz
Upload date: Jun 3, 2026
Size: 27.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for utf_8000-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3a3c26c61a856fdbc75c578c572b4ccc43657779b935cf8fa4a499dc6f10b5ff`
MD5	`9e2a56dffecb86448e301933788ecaf3`
BLAKE2b-256	`67e66747cc5372928054f204b0a4e44116be4878ee58b3aa4280fed8eda1615e`

See more details on using hashes here.

File details

Details for the file utf_8000-2.1.0-py3-none-any.whl.

File metadata

Download URL: utf_8000-2.1.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 28.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for utf_8000-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`65b2b8905e1642637b3331d1b304b7f4da3fd0aeac2f9cf5271a65277fd0f821`
MD5	`a5bd303ab667cdf3de02ec538a65afa2`
BLAKE2b-256	`6f46010c4d8972818d85b5037cb05a1dfad33598de190d27733491d95a337047`

See more details on using hashes here.

UTF-8000 2.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Project description

UTF-8000

Installing

TLDR Examples

Using `utf-8000 info`

Color key:

Using `utf-8000 encode`

Using `utf-8000 decode`

Package Contents

See Also

Project details

Verified details

Maintainers

Unverified details

Project links

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

UTF-8000 2.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Project description

UTF-8000

Installing

TLDR Examples

Using utf-8000 info

Color key:

Using utf-8000 encode

Using utf-8000 decode

Package Contents

See Also

Project details

Verified details

Maintainers

Unverified details

Project links

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Using `utf-8000 info`

Using `utf-8000 encode`

Using `utf-8000 decode`