Skip to main content

Converts code point sequences to and from Unicode strings

Project description

Unicode Code Points for Python

Until Python 3.3, the Python runtime could be compiled in one of two Unicode modes:

  1. sys.maxunicode == 0x10FFFF

    In this mode, Python’s Unicode strings support the full range of Unicode code points from U+0000 to U+10FFFF. One code point is represented by one string element:

    >>> import sys
    >>> hex(sys.maxunicode)
    '0x10ffff'
    >>> len(u'\U0001F40D')
    1
    >>> [c for c in u'\U0001F40D']
    [u'\U0001f40d']

    This is the default for Python 2.7 on Linux, as well as universally on Python 3.3 and later across all operating systems.

  2. sys.maxunicode == 0xFFFF

    In this mode, Python’s Unicode strings only support the range of Unicode code points from U+0000 to U+FFFF. Any code points from U+10000 through U+10FFFF are represented using a pair of string elements in the UTF-16 encoding:

    >>> import sys
    >>> hex(sys.maxunicode)
    '0xffff'
    >>> len(u'\U0001F40D')
    2
    >>> [c for c in u'\U0001F40D']
    [u'\ud83d', u'\udc0d']

    This is the default for Python 2.7 on macOS and Windows.

This runtime difference makes writing Python modules to manipulate Unicode strings as series of codepoints quite inconvenient.

The codepoints module

This module solves the problem by exposing APIs to convert Unicode strings to and from lists of code points, regardless of the underlying setting for sys.maxunicode:

>>> hex(sys.maxunicode)
'0xffff'
>>> snake = tuple(codepoints.from_unicode(u'\U0001F40D'))
>>> len(snake)
1
>>> snake[0]
128013
>> hex(snake[0])
'0x1f40d'
>>> codepoints.to_unicode(snake)
u'\U0001f40d'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codepoints-1.0.zip (6.1 kB view details)

Uploaded Source

File details

Details for the file codepoints-1.0.zip.

File metadata

  • Download URL: codepoints-1.0.zip
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for codepoints-1.0.zip
Algorithm Hash digest
SHA256 241fa77a03e6f0340cc6d18a875ef4888d10b20aa67f0128d0603482a11e16eb
MD5 037c7b5e8d1c0e60670e9b5de673b97f
BLAKE2b-256 8d2849bbe1e81240ca9cfa825c02cee48bac2facd0f8cdf6a5a17f71856a19a3

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page