Skip to main content

Unicode category database

Project description

unicategories

Unicode category database, generated on setup.

This module exposes a category dictionary containing RangeGroup instances.

Example

from unicategories import categories

upperchars = categories['Lu'].characters()  # iterator
print('Unicode uppercase caracters are "%s"' % ''.join(upperchars))
# Unicode uppercase caracters are "ABCDEF..."

RangeGroup

Immutable iterable (based on tuple, with some useful methods) of (start, end) tuples being, like python's range, open at the end.

This method have been chosen for memory efficiency, storing individually all characters on memory would take a lot of memory.

RangeGroup class provides the following methods:

range_group.characters()

Get iterator with all characters on this range group.

:yields: iterator of characters (str of size 1)
:ytype: str

range_group.codes()

Get iterator for all unicode code points contained in this range group.

:yields: iterator of character index (int)
:ytype: int

range_group.has(character)

Get if character (or character code point) is contained by any range on
this range group.

:param character: character or unicode code point to look for
:type character: str or int
:returns: True if character is contained by any range, False otherwise
:rtype: bool

Unicode categories


Taken from wikipedia.

Value Category Major, minor Basic type Character assigned Fixed Remarks
Lu Letter, uppercase Graphic Character
Ll Letter, lowercase Graphic Character
Lt Letter, titlecase Graphic Character Ligatures containing uppercase followed by lowercase letters (e.g., Dž , Lj , Nj , and Dz )
Lm Letter, modifier Graphic Character
Lo Letter, other Graphic Character
Mn Mark, nonspacing Graphic Character
Mc Mark, spacing combining Graphic Character
Me Mark, enclosing Graphic Character
Nd Number, decimal digit Graphic Character All these, and only these, have Numeric Type = De
Nl Number, letter Graphic Character Numerals composed of letters or letterlike symbols (e.g., Roman numerals )
No Number, other Graphic Character E.g., vulgar fractions , superscript and subscript digits
Pc Punctuation, connector Graphic Character Includes "_" underscore
Pd Punctuation, dash Graphic Character Includes several hyphen characters
Ps Punctuation, open Graphic Character Opening bracket characters
Pe Punctuation, close Graphic Character Closing bracket characters
Pi Punctuation, initial quote Graphic Character Opening quotation mark . Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
Pf Punctuation, final quote Graphic Character Closing quotation mark. May behave like Ps or Pe depending on usage
Po Punctuation, other Graphic Character
Sm Symbol, math Graphic Character
Sc Symbol, currency Graphic Character
Sk Symbol, modifier Graphic Character
So Symbol, other Graphic Character
Zs Separator, space Graphic Character Includes the space, but not TAB , CR , or LF , which are Cc
Zl Separator, line Format Character Only U+2028 LINE SEPARATOR (LSEP)
Zp Separator, paragraph Format Character Only U+2029 PARAGRAPH SEPARATOR (PSEP)
Cc Other, control Control Character Fixed 65 No name , <control>
Cf Other, format Format Character Includes the soft hyphen , control characters to support bi-directional text , and language tag characters
Cs Other, surrogate Surrogate Not (but abstract) Fixed 2,048 No name , <surrogate>
Co Other, private use Private-use Not (but abstract) Fixed 137,468 total: 6,400 in BMP , 131,068 in Planes 15–16 No name , <private-use>
Cn Other, not assigned Noncharacter Not Fixed 66 No name , <noncharacter>
Cn Other, not assigned Reserved Not Not fixed No name , <reserved>

In addition to that, unicategories provide general categories L, M, N, P, S, Z and C.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for unicategories, version 0.1.0
Filename, size File type Python version Upload date Hashes
Filename, size unicategories-0.1.0-py2.py3-none-any.whl (47.7 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size unicategories-0.1.0.tar.gz (9.5 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page