Text sorting function for the Czech language
Project description
Czech Sort
This is a pure-Python library for Czech-language alphabetical sorting.
Quick Use
From Python:
>>> import czech_sort
>>> czech_sort.sorted(['sídliště', 'shoda', 'schody'])
['shoda', 'schody', 'sídliště']
>>> sorted(['sídliště', 'shoda', 'schody'], key=czech_sort.key)
['shoda', 'schody', 'sídliště']
On the command line::
$ python -m czech_sort < file.txt
shoda
schody
sídliště
Why another sorting library?
To sort Python strings in the Czech language, there are three other options:
- Use
PyICU
. This can sort really well, and do all kinds of wonderful, standards-compliant Unicode things. Perfect for publication-quality results. Unfortunately, ICU can be a major pain to install, making it overkill if you just want to sort a list of strings. - Set the locale, then use
locale.strxfrm
. (Yes,strxfrm
! Try saying that ten times fast!) This depends on the Czech POSIX locale being available, so it's hardly portable. - Just use Python's built-in string sort. This sorts lexicographically by Unicode codepoints. It might be good enough for you? Maybe?
Scope
The czech-sort
library is a compromise. It should give you good results in
the 99% case.
Do not use this if you need proper sorting of symbols, non-Latin scripts, or diacritics other than Czech/Slovak.
Any other deviation from the relevant standard, ČSN 97 6030
, should be
considered a bug. However, neither the author nor the community at large
have access to the standard, which makes finding such bugs somewhat difficult.
Full API
czech_sort.sorted(iterable)
Takes an iterable of strings, and returns a list of them, sorted.
czech_sort.key(s)
Returns a sort key object for a given string.
This function is suitable as the key
for functions like the built-in
sorted
or list.sort
.
czech_sort.bytes_key(s)
Returns a sort key for a given string, as bytes.
This is suitable as a DB-API custom function like the built-in
sqlite3
connection's create_function
.
WARNING: Do not store the results of this function. The format can change
in future versions of czech_sort
.
Compatibility
The czech-sort library can be used with Python 2.6+ and 3.5+.
Under Python 2, it only accepts unicode
strings.
Installation
Install this into your virtualenv
by running:
$ pip install czech-sort
Contribute
Bug reports and comments are welcome at Github.
Patches are also welcome! Source code is hosted at Github:
$ git clone http://github.com/encukou/czech-sort
To run the included tests:
$ python -m pip install -e.[test]
$ python -m pytest
If you would like to contribute, but are confused by the above,
then please e-mail encukou at
gmail dot
com.
License
The project is licensed under the MIT license. May it serve you well.
Changelog
1.1.0 (2023-07-11)
- Add
bytes_key
(Thanks to @honzajavorek!) - Drop support for Python 2
1.0.1 (2021-08-30)
- Fix bug that prevented sorting strings that contain 'Ł' and/or 'Ø'. (Thanks to @dark-light-cz for reporting and @jiri-one for the PR!)
1.0.0 (2020-09-14)
No code changes. Since this has been stable for five years I decided to call it 1.0.
- Packaging improvements
- Tested with Python 2.7 and 3.5-3.9
0.4 (2015-09-05)
- First general release
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file czech-sort-1.1.0.tar.gz
.
File metadata
- Download URL: czech-sort-1.1.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc8b5217bf065b99df4a006e7fc5326d1e584480945242b39721c941724f0ebe |
|
MD5 | c708635481b1549ba44f535d5276cd07 |
|
BLAKE2b-256 | 425417244d0a363626071f39a5e7d726226b9bc021e24d348fc4f30c75daa48f |
File details
Details for the file czech_sort-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: czech_sort-1.1.0-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c838dc8b39dd76ca95b0a01ad9f4b177bead96fa04bb52a06f558efd8a07a3c |
|
MD5 | 2dd4af70f51d3316b92f26816e3f5b14 |
|
BLAKE2b-256 | 4180ab38870957c5a6c93f296ea51b5800542ae01441b67e46e5ac6ee76e5128 |