Unicode casefold support for python 2.
Python 3 has str.casefold(). Python 2 doesn’t. py2casefold brings casefolding support to Python 2.
pip install py2casefold
>>> from py2casefold import casefold >>> print casefold(u"tschüß") tschüss >>> casefold(u"ΣίσυφοςﬁÆ") == casefold(u"ΣΊΣΥΦΟσFIæ") == u"σίσυφοσfiæ" True
Note that casefold does not normalize the string. Casefolding and normalization are different operations. For more info see http://www.w3.org/International/wiki/Case_folding, and http://www.w3.org/TR/charmod-norm/.
If you are looking for string similarity you will also probably want to consider one of the unicode normalization options (NFC, NFKC, NFD, NFKD) that are available with Python’s built in unicodedata.normalize().
At the moment, this pure Python casefold implementation is significantly (> 20x) slower than the optimized py3 C implementation. This can be improved later, but it is currently more than sufficient for basic case folding. As a rough estimate, case folding 100 characters clocks in at ~25μs on an old developer laptop.
To run the tests on all supported Python version simple use tox.
You will need to have Python 2.7, Python 3.4, Python 3.5 and Python 3.6 installed.
BSD and the Unicode license agreement. This module includes data from the Unicode consortium which should include the appropriate notice (see http://unicode.org/copyright.html).
See LICENSE file for details.
|Filename, Size & Hash SHA256 Hash Help||File Type||Python Version||Upload Date|
(20.7 kB) Copy SHA256 Hash SHA256
|Wheel||py2.py3||Oct 26, 2017|