Skip to main content

Look up Unicode character name or code point label and search in Unicode character names

Project description

unicode-charnames

PyPI Version PyPI License

Unicode characters have names that serve as unique identifiers for each character. The character names in the Unicode Standard are identical to those of ISO/IEC 10646.

The unicode-charnames package performs searches for Unicode character names or code point labels by Unicode character, and searches for Unicode code points by character names. It also performs substring searches in Unicode character names. This package supports version 13.0 of the Unicode Standard (143,859 characters).

The generic term “character name” refers to the Unicode character “Name” property value for an encoded Unicode character. For code points that do not have character names (unassigned, reserved code points and other special code point types), the Unicode Standard uses constructed Unicode code point labels, displayed between angle brackets, to stand in for character names.

Installation

pip install unicode-charnames

Features

The library provides:

  • A function to get the character name (the normative character property “Name”) or the code point label (for characters that do not have character names) of a single Unicode character.
  • A function to get the code point value (in the usual 4- to 6-digit hexadecimal format) corresponding to a Unicode character name; the search is case-sensitive and requires exact string match.
  • A function to search characters by character name; the search is case-insensitive but requires exact substring match.

Example usage

    # -*- coding: utf-8 -*-
    from unicode_charnames import charname, codepoint, search_charnames

    # charname()

    for item in ['龠', '💓', '\u00E5', '\u0002']:
        print(charname(item))
        # Output:
        # CJK UNIFIED IDEOGRAPH-9FA0
        # BEATING HEART
        # LATIN SMALL LETTER A WITH RING ABOVE
        # <control-0002>

    # codepoint()

    names = [
        'LATIN CAPITAL LETTER E WITH ACUTE',
        'SQUARE ERA NAME REIWA',
        'SUPERCALIFRAGILISTICEXPIALIDOCIOUS'
    ]
    for item in names:
        print(codepoint(item))
        # Output:
        # 00C9
        # 32FF
        # None

    # search_charnames()

    for x in search_charnames('sextile'):
        print('\t'.join(x))
        # Output:
        # 26B9	SEXTILE
        # 26BA	SEMISEXTILE

References

License

unicode-charnames is released under an MIT license. The full text of the license is available here.

The Unicode Standard “DerivedName.txt” file is licensed under the Unicode License Agreement for Data Files and Software. Please consult the UNICODE, INC. LICENSE AGREEMENT prior to use.

Changes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicode_charnames-13.0.0rc1.tar.gz (270.0 kB view details)

Uploaded Source

File details

Details for the file unicode_charnames-13.0.0rc1.tar.gz.

File metadata

  • Download URL: unicode_charnames-13.0.0rc1.tar.gz
  • Upload date:
  • Size: 270.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.6.2

File hashes

Hashes for unicode_charnames-13.0.0rc1.tar.gz
Algorithm Hash digest
SHA256 5a5de8c8555b769ec9a7e2d957f23099463faea286f8f585638fb08aa1f9a4c2
MD5 c9bc7d581ca84b6670aa66563b291c75
BLAKE2b-256 87b54c48331181e89f18c74fc6551faae6bbe37c22e2425baf74a8c862f82007

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page