Unlimited UTF-8!
Project description
UTF-8000
Unlimited UTF-8!
ASCII ⊆ UTF-8 ⊆ UTF-8000
- UTF-8000 is the correct way to expand UTF-8 indefinitely.
- This repository contains a Python implementation of UTF-8000.
- See UTF-8000-Website for the full writeup.
- UTF-8000 is in no way endorsed by or representative of the Unicode Consortium. This is a standalone project.
Installing
Available on PyPI as UTF-8000
Recommended install using pipx:
$ pipx install UTF-8000
provides
utf-8000(1)
with subcommands
utf-8000 info
utf-8000 encode
utf-8000 decode
TLDR Examples
Using utf-8000 info
Color key:
- Bright Cyan: self-synchronization prefix
0or11or10 - Bright Magenta: start bits
111...110 - Bright Green: mandatory content bits
- Green: content bits
Observe that the first byte has a self-synchronization prefix of 11 and is otherwise completely filled with 1s from the start bits. The start bits continue on into the continuation bytes which have a self-synchronization prefix of 10. The second byte is also completely filled with start bits. The third byte contains the end of the start bits, that is the two 1s and the terminating 0. The five mandatory content bits are in bright green, and the rest of the content is in green. Notice that at least one of the 'mandatory content bits' is a 1 to avoid an overlong encoding.
Using utf-8000 encode
$ echo 'U+DEADBEEFBADF00D' | utf-8000 encode | hexdump -C
00000000 ff bc b7 aa b6 be bb bb ab 9f 80 8d |............|
0000000c
Using utf-8000 decode
Using the bytes from the encode example above
$ echo -ne '\xff\xbc\xb7\xaa\xb6\xbe\xbb\xbb\xab\x9f\x80\x8d' | utf-8000 decode
U+DEADBEEFBADF00D
Or another example
$ echo 'שלום' | utf-8000 decode
U+05E9
U+05DC
U+05D5
U+05DD
U+000A
Package Contents
- encode.py
encode(x: int) -> bytes: Encode an unsigned integer in UTF-8000 and return the bytesfancy_encode(x: int) -> tuple[UTF8000Byte]: Encode an unsigned integer in UTF-8000 and return 'fancy'UTF8000Bytes, useful for education and inspection.
- decode.py
UTF8000IncrementalDecoder: An incremental decoder class that can be fed bytes, and can be iterated over, yieldingUTF8000Ints when full byte sequences have been supplied and decoded.
- UTF8000Byte.py
UTF8000Byte: a 'fancy' byte wrapper around UTF-8000 bytes that- Various constants and utility functions
See Also
The main UTF-8000 specification, including how to derive it, history, statistics, trivia, rejected alternatives etc is located at UTF-8000-Website, hosted at utf-8000.jb2170.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file utf_8000-2.1.0.tar.gz.
File metadata
- Download URL: utf_8000-2.1.0.tar.gz
- Upload date:
- Size: 27.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a3c26c61a856fdbc75c578c572b4ccc43657779b935cf8fa4a499dc6f10b5ff
|
|
| MD5 |
9e2a56dffecb86448e301933788ecaf3
|
|
| BLAKE2b-256 |
67e66747cc5372928054f204b0a4e44116be4878ee58b3aa4280fed8eda1615e
|
File details
Details for the file utf_8000-2.1.0-py3-none-any.whl.
File metadata
- Download URL: utf_8000-2.1.0-py3-none-any.whl
- Upload date:
- Size: 28.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65b2b8905e1642637b3331d1b304b7f4da3fd0aeac2f9cf5271a65277fd0f821
|
|
| MD5 |
a5bd303ab667cdf3de02ec538a65afa2
|
|
| BLAKE2b-256 |
6f46010c4d8972818d85b5037cb05a1dfad33598de190d27733491d95a337047
|