Look up Unicode character name or code point label and search in Unicode character names
This package supports Unicode version 15.0, released on September 13, 2022.
The library provides:
- A function to get the character name (the normative character property “Name”) or the code point label (for characters that do not have character names) of a single Unicode character.
- A function to get the code point value (in the usual 4- to 6-digit hexadecimal format) corresponding to a Unicode character name; the search is case-sensitive and requires exact string match.
- A function to search characters by character name; the search is case-insensitive but requires exact substring match.
The generic term “character name” refers to the Unicode character “Name” property value for an encoded Unicode character. For code points that do not have character names (unassigned, reserved code points and other special code point types), the Unicode standard uses constructed Unicode code point labels, displayed between angle brackets, to stand in for character names.
Installation or upgrade
The easiest method to install is using pip:
pip install unicode-charnames
To update the package to the latest version:
pip install --upgrade unicode-charnames
To get the version of the Unicode character database currently used:
>>> from unicode_charnames import UCD_VERSION >>> UCD_VERSION '15.0.0'
from unicode_charnames import charname, codepoint, search_charnames # charname for char in '龠💓\u00E5\u0002': print(charname(char)) # Output: # CJK UNIFIED IDEOGRAPH-9FA0 # BEATING HEART # LATIN SMALL LETTER A WITH RING ABOVE # <control-0002> # codepoint for name in [ 'LATIN CAPITAL LETTER E WITH ACUTE', 'SQUARE ERA NAME REIWA', 'SUPERCALIFRAGILISTICEXPIALIDOCIOUS' ]: print(codepoint(name)) # Output: # 00C9 # 32FF # None # search_charnames for x in search_charnames('break'): print('\t'.join(x)) # Output: # 00A0 NO-BREAK SPACE # 2011 NON-BREAKING HYPHEN # 202F NARROW NO-BREAK SPACE # 4DEA HEXAGRAM FOR BREAKTHROUGH # FEFF ZERO WIDTH NO-BREAK SPACE
This implementation is based on the following resource: Section 4.8, Name, in the Unicode core specification, version 15.0.0.
The code is available under the MIT license.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.