PRECIS-i18n: Internationalized Usernames and Passwords
If you want your application to accept unicode user names and passwords, you must be careful in how you validate and compare them. The PRECIS framework makes internationalized user names and passwords safer for use by applications. PRECIS profiles transform unicode strings into a canonical form, suitable for comparison.
This module implements the PRECIS Framework as described in:
- PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols (RFC 8264)
- Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords (RFC 8265)
- Preparation, Enforcement, and Comparison of Internationalized Strings Representing Nicknames (RFC 8266)
Requires Python 3.3 or later.
Use the get_profile function to obtain a profile object, then use its enforce method. The enforce method returns a Unicode string.
>>> from precis_i18n import get_profile >>> username = get_profile('UsernameCaseMapped') >>> username.enforce('Kevin') 'kevin' >>> username.enforce('\u212Aevin') 'kevin' >>> username.enforce('\uFF2Bevin') 'kevin' >>> username.enforce('\U0001F17Aevin') Traceback (most recent call last): ... UnicodeEncodeError: 'UsernameCaseMapped' codec can't encode character '\U0001f17a' in position 0: DISALLOWED/symbols
Alternatively, you can use the Python str.encode API. Import the precis_i18n.codec module to register the PRECIS codec names. Now you can use the str.encode method with any unicode string. The result will be a UTF-8 encoded byte string or a UnicodeEncodeError if the string is disallowed.
>>> import precis_i18n.codec >>> 'Kevin'.encode('UsernameCasePreserved') b'Kevin' >>> '\u212Aevin'.encode('UsernameCasePreserved') b'Kevin' >>> '\uFF2Bevin'.encode('UsernameCasePreserved') b'Kevin' >>> '\u212Aevin'.encode('UsernameCaseMapped') b'kevin' >>> '\uFF2Bevin'.encode('OpaqueString') b'\xef\xbc\xabevin' >>> '\U0001F17Aevin'.encode('UsernameCasePreserved') Traceback (most recent call last): ... UnicodeEncodeError: 'UsernameCasePreserved' codec can't encode character '\U0001f17a' in position 0: DISALLOWED/symbols
Supported Profiles and Codecs
Each PRECIS profile has a corresponding codec name. The CaseMapped variant converts the string to lower case for implementing case-insensitive comparison.
The CaseMapped profiles use Unicode ToLower per the latest RFC. Previous verions of this package used Unicode Default Case Folding. There are CaseMapped variants for different case transformations. These profile names are deprecated:
The PRECIS base string classes are also available as codecs:
Userparts and Space Delimited Usernames
The Username profiles in this implementation do not allow spaces. The Username profiles correspond to the definition of “userparts” in RFC 8265. If you want to allow spaces in your application’s usernames, you must split the string first.
def enforce_app_username(name): profile = precis_i18n.get_profile('UsernameCasePreserved') userparts = [profile.enforce(userpart) for userpart in name.split(' ')] return ' '.join(userparts)
Be aware that a username constructed this way can contain bidirectional text in the separate userparts.
A PRECIS profile raises a UnicodeEncodeError exception if a string is disallowed. The reason field specifies the kind of error.
|DISALLOWED/arabic_indic||Arabic-Indic digits cannot be mixed with Extended Arabic-Indic Digits. (Context)|
|DISALLOWED/bidi_rule||Right-to-left string cannot contain left-to-right characters due to the “Bidi” rule. (Context)|
|DISALLOWED/controls||Control character is not allowed.|
|DISALLOWED/empty||After applying the profile, the result cannot be empty.|
|DISALLOWED/exceptions||Exception character is not allowed.|
|DISALLOWED/extended_arabic_indic||Extended Arabic-Indic digits cannot be mixed with Arabic-Indic Digits. (Context)|
|DISALLOWED/greek_keraia||Greek keraia must be followed by a Greek character. (Context)|
|DISALLOWED/has_compat||Compatibility characters are not allowed.|
|DISALLOWED/hebrew_punctuation||Hebrew punctuation geresh or gershayim must be preceded by Hebrew character. (Context)|
|DISALLOWED/katakana_middle_dot||Katakana middle dot must be accompanied by a Hiragana, Katakana, or Han character. (Context)|
|DISALLOWED/middle_dot||Middle dot must be surrounded by the letter ‘l’. (Context)|
|DISALLOWED/not_idempotent||After reapplying the profile, the result is not stable.|
|DISALLOWED/old_hangul_jamo||Conjoining Hangul Jamo is not allowed.|
|DISALLOWED/other||Other character is not allowed.|
|DISALLOWED/other_letter_digits||Non-traditional letter or digit is not allowed.|
|DISALLOWED/precis_ignorable_properties||Default ignorable or non-character is not allowed.|
|DISALLOWED/punctuation||Non-ASCII punctuation character is not allowed.|
|DISALLOWED/spaces||Space character is not allowed.|
|DISALLOWED/symbols||Non-ASCII symbol character is not allowed.|
|DISALLOWED/unassigned||Unassigned unicode character is not allowed.|
|DISALLOWED/zero_width_joiner||Zero width joiner must immediately follow a combining virama. (Context)|
|DISALLOWED/zero_width_nonjoiner||Zero width non-joiner must immediately follow a combining virama, or appear where it breaks a cursive connection in a formally cursive script. (Context)|
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size precis_i18n-1.0.1-py3-none-any.whl (22.0 kB)||File type Wheel||Python version py3||Upload date||Hashes View hashes|
|Filename, size precis_i18n-1.0.1.tar.gz (63.9 kB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for precis_i18n-1.0.1-py3-none-any.whl