Skip to main content

python library to make and get the encoding of unicode code point in UTF8.

Project description

Python library to make and get the encoding of unicode code points in UTF8.

tables of rules for encoding UTF8

Design UTF-8 can be seen in the following table that originally proposed by Dave Prosser and subsequently modified by Ken Thompson.

Bit Code Point The first code point The last code point Byte in Squence
7 U+0000 U+007F 1
11 U+0080 U+07FF 2
16 U+0800 U+FFFF 3
21 U+10000 U+1FFFFF 4
26 U+200000 U+3FFFFFF 5
31 U+4000000 U+7FFFFFFF 6

You can read more on the table above in a link.

Installation

pip install utf8_codepoint

Doc And Contribute

More Documentation in github.

Example

simple examples using this package.

Quick Start

from utf8_codepoint import CodePoint

# unicode symbol for European currency
euro_money = "U+20AC"

# create instance object
cp = CodePoint(euro_money)

# get representation integer of the Unicode Code Point
print(cp.to_int())

the result is:

226 130 172

to a hexadecimal representation

from utf8_codepoint import CodePoint
...

print(cp.to_hex())

the result is:

E2 82 AC

to a string with binary representation

from utf8_codepoint import CodePoint
...

print(cp.to_string())

the result is:

11100010 10000010 10101100

to a list of binary string representation

from utf8_codepoint import CodePoint
...

print(cp.to_list())

the result is:

['11100010', '10000010', '10101100']

displays all the data with beautiful style

from utf8_codepoint import CodePoint
...

cp.bprint()

the result is:

{'0x20AC': {'bit_list': ['11100010', '10000010', '10101100'],
    'code_point': 16,
    'hexa_list': ['0xe2', '0x82', '0xac'],
    'initial_bit': '1110',
    'integer_list': [226, 130, 172]}}

Get all data

from utf8_codepoint import CodePoint
...

print(cp.get_all())

the result is:

{'0x20AC':
        {
                'bit_list': ['11100010', '10000010', '10101100'],
                'integer_list': [226, 130, 172],
                'initial_bit': '1110',
                'hexa_list': ['0xe2', '0x82', '0xac'],
                'code_point': 16
        }
}

If you want to turn it into a json format, you can pass a true value as a parameter in the method get_all:

cp.get_all(True)

Project details


Release history Release notifications

This version

1.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for utf8_codepoint, version 1.1.0
Filename, size File type Python version Upload date Hashes
Filename, size utf8_codepoint-1.1.0.tar.gz (4.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page