Skip to main content

Look up Unicode character name or code point label and search in Unicode character names

Project description

unicode-charnames

This package supports Unicode version 15.0, released on September 13, 2022.

The library provides:

  • A function to get the character name (the normative character property “Name”) or the code point label (for characters that do not have character names) of a single Unicode character.
  • A function to get the code point value (in the usual 4- to 6-digit hexadecimal format) corresponding to a Unicode character name; the search is case-sensitive and requires exact string match.
  • A function to search characters by character name; the search is case-insensitive but requires exact substring match.

The generic term “character name” refers to the Unicode character “Name” property value for an encoded Unicode character. For code points that do not have character names (unassigned, reserved code points and other special code point types), the Unicode standard uses constructed Unicode code point labels, displayed between angle brackets, to stand in for character names.

Installation or upgrade

The easiest method to install is using pip:

pip install unicode-charnames

To update the package to the latest version:

pip install --upgrade unicode-charnames

UCD version

To get the version of the Unicode character database currently used:

>>> from unicode_charnames import UCD_VERSION
>>> UCD_VERSION
'15.0.0'

Example usage

    from unicode_charnames import charname, codepoint, search_charnames

    # charname
    for char in '龠💓\u00E5\u0002':
        print(charname(char))
        # Output:
        # CJK UNIFIED IDEOGRAPH-9FA0
        # BEATING HEART
        # LATIN SMALL LETTER A WITH RING ABOVE
        # <control-0002>

    # codepoint
    for name in [
            'LATIN CAPITAL LETTER E WITH ACUTE',
            'SQUARE ERA NAME REIWA',
            'SUPERCALIFRAGILISTICEXPIALIDOCIOUS'
    ]:
        print(codepoint(name))
        # Output:
        # 00C9
        # 32FF
        # None

    # search_charnames
    for x in search_charnames('break'):
        print('\t'.join(x))
        # Output:
        # 00A0    NO-BREAK SPACE
        # 2011    NON-BREAKING HYPHEN
        # 202F    NARROW NO-BREAK SPACE
        # 4DEA    HEXAGRAM FOR BREAKTHROUGH
        # FEFF    ZERO WIDTH NO-BREAK SPACE

Related resource

This implementation is based on the following resource: Section 4.8, Name, in the Unicode core specification, version 15.0.0.

Licenses

The code is available under the MIT license.

Usage of Unicode data files is governed by the UNICODE TERMS OF USE, a copy of which is included as UNICODE-LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicode_charnames-15.0.0.tar.gz (278.5 kB view details)

Uploaded Source

File details

Details for the file unicode_charnames-15.0.0.tar.gz.

File metadata

  • Download URL: unicode_charnames-15.0.0.tar.gz
  • Upload date:
  • Size: 278.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for unicode_charnames-15.0.0.tar.gz
Algorithm Hash digest
SHA256 6dba93a7129fee0f86d544c8dfd59cc1d04ae68fe01f35670d5a25c8f1adaab6
MD5 20f0f78cb0f7c3c602ad07832a12e0f0
BLAKE2b-256 d2688558fe5572d79db56db949751dd81d563382ac0eacfdbbfea4f4ccf7692c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page