Skip to main content

python library to make and get the encoding of unicode code point in UTF8.

Project description

Python library to make and get the encoding of unicode code points in UTF8.

tables of rules for encoding UTF8

Design UTF-8 can be seen in the following table that originally proposed by Dave Prosser and subsequently modified by Ken Thompson.

Bit Code Point

The first code point

The last code point

Byte in Squence

7

U+0000

U+007F

1

11

U+0080

U+07FF

2

16

U+0800

U+FFFF

3

21

U+10000

U+1FFFFF

4

26

U+200000

U+3FFFFFF

5

31

U+4000000

U+7FFFFFFF

6

You can read more on the table above in a link.

Installation

pip install utf8_codepoint

Doc And Contribute

More Documentation in github.

Example

simple examples using this package.

Quick Start

from utf8_codepoint import CodePoint

# unicode symbol for European currency
euro_money = "U+20AC"

# create instance object
cp = CodePoint(euro_money)

# get representation integer of the Unicode Code Point
print(cp.to_int())

the result is:

226 130 172

to a hexadecimal representation

from utf8_codepoint import CodePoint
...

print(cp.to_hex())

the result is:

E2 82 AC

to a string with binary representation

from utf8_codepoint import CodePoint
...

print(cp.to_string())

the result is:

11100010 10000010 10101100

to a list of binary string representation

from utf8_codepoint import CodePoint
...

print(cp.to_list())

the result is:

['11100010', '10000010', '10101100']

displays all the data with beautiful style

from utf8_codepoint import CodePoint
...

cp.bprint()

the result is:

{'0x20AC': {'bit_list': ['11100010', '10000010', '10101100'],
    'code_point': 16,
    'hexa_list': ['0xe2', '0x82', '0xac'],
    'initial_bit': '1110',
    'integer_list': [226, 130, 172]}}

Get all data

from utf8_codepoint import CodePoint
...

print(cp.get_all())

the result is:

{'0x20AC':
        {
                'bit_list': ['11100010', '10000010', '10101100'],
                'integer_list': [226, 130, 172],
                'initial_bit': '1110',
                'hexa_list': ['0xe2', '0x82', '0xac'],
                'code_point': 16
        }
}

If you want to turn it into a json format, you can pass a true value as a parameter in the method get_all:

cp.get_all(True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

utf8_codepoint-1.1.0.tar.gz (4.2 kB view details)

Uploaded Source

File details

Details for the file utf8_codepoint-1.1.0.tar.gz.

File metadata

File hashes

Hashes for utf8_codepoint-1.1.0.tar.gz
Algorithm Hash digest
SHA256 9b3c5743c47612190e8d314ab1176bef105328500937a136d1f41e5d118badcf
MD5 a0ab519a64157317e3b16586faf20831
BLAKE2b-256 99e365299935f65f18036c567ac9f5dcc1fec82e1cebbbe79e4725c46ab42289

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page