ZiCutter: cut character smaller

These details have not been verified by PyPI

Project links

Homepage

Project description

ZiCutter

ZiCutter: cut character smaller

use

pip install ZiCutter

from ZiCutter import ZiCutter

line = "'〇㎡[คุณจะจัดพิธีแต่งงานเมื่อไรคะัีิ์ื็ํึ]Ⅷpays-g[ran]d-blanc-élevé » (白高大夏國)😀熇'"

# build
cutter = ZiCutter(dir="")
cutter.build()

# use
cutter = ZiCutter(dir="")
for c in line:
    print(cutter.cutChar(c))

background

Unicode 14.0 adds 838 characters, for a total of 144,697 characters. (https://www.unicode.org/versions/Unicode14.0.0/) About 2/3 of them are HanZi. To shrink vocab size, we cut character to smaller.

vocab

minium az 26 number 10 Gram 36 YuanZi 2366 total 2402

cut name rare character

name = name of 'x'
tokens=[name[:2],"#"+name[-1]]
base: Grams, [a~~z][a~~z],[0~~9][0~~9],#[a~~z],#[0~~9]

'😀' : name is 'GRINNING FACE'
'😀' -> ["##gr","ce"]

cut ids for HanZi

base: YuanZi (minium)

熇	⿰火高    
'熇' -> ['⿰','火','高']

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.10

Feb 24, 2023

0.0.9

Feb 19, 2023

0.0.8

Jan 10, 2023

0.0.7

Jan 2, 2023

0.0.6

Jan 2, 2023

0.0.5

Aug 31, 2022

0.0.4

Aug 15, 2022

0.0.3

Jul 9, 2022

0.0.2

Jul 5, 2022

0.0.1

Jun 29, 2022

0.0.0

Jun 28, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ZiCutter-0.0.10.tar.gz (1.4 MB view hashes)

Uploaded Feb 24, 2023 Source

Built Distribution

ZiCutter-0.0.10-py3-none-any.whl (1.4 MB view hashes)

Uploaded Feb 24, 2023 Python 3

Hashes for ZiCutter-0.0.10.tar.gz

Hashes for ZiCutter-0.0.10.tar.gz
Algorithm	Hash digest
SHA256	`d4d03609d8083a7fd8a57858660089e8786f1764ca4ec024c2ec236349596c9b`
MD5	`d91b7d5a5e5931f7a65abb159da7ce57`
BLAKE2b-256	`ca5716025be5e484da835c1f77cce7cb88f47b6d2e6c79645d6a7d0cf359cf9e`

Hashes for ZiCutter-0.0.10-py3-none-any.whl

Hashes for ZiCutter-0.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8ee06158216385ca5eed2601c8edb38e76b60c4b0b3a5fbeca95d785500339fc`
MD5	`4d86227600afd9921ffffe801c6eb550`
BLAKE2b-256	`cfdadecdafaa11e25e349dfc2a53caa45f67192a2ed09149bc786c6338a63e80`