Skip to main content

Text sorting function for the Czech language

Project description

Czech Sort

This is a pure-Python library for Czech-language alphabetical sorting.

Quick Use

From Python:

>>> import czech_sort

>>> czech_sort.sorted(['sídliště', 'shoda', 'schody'])
['shoda', 'schody', 'sídliště']

>>> sorted(['sídliště', 'shoda', 'schody'], key=czech_sort.key)
['shoda', 'schody', 'sídliště']

On the command line::

$ python -m czech_sort < file.txt
shoda
schody
sídliště

Why another sorting library?

To sort Python strings in the Czech language, there are three other options:

  • Use PyICU. This can sort really well, and do all kinds of wonderful, standards-compliant Unicode things. Perfect for publication-quality results. Unfortunately, ICU can be a major pain to install, making it overkill if you just want to sort a list of strings.
  • Set the locale, then use locale.strxfrm. (Yes, strxfrm! Try saying that ten times fast!) This depends on the Czech POSIX locale being available, so it's hardly portable.
  • Just use Python's built-in string sort. This sorts lexicographically by Unicode codepoints. It might be good enough for you? Maybe?

Scope

The czech-sort library is a compromise. It should give you good results in the 99% case.

Do not use this if you need proper sorting of symbols, non-Latin scripts, or diacritics other than Czech/Slovak.

Any other deviation from the relevant standard, ČSN 97 6030, should be considered a bug. However, neither the author nor the community at large have access to the standard, which makes finding such bugs somewhat difficult.

Full API

czech_sort.sorted(iterable)

Takes an iterable of strings, and returns a list of them, sorted.

czech_sort.key(s)

Returns a sort key object for a given string.

This function is suitable as the key for functions like the built-in sorted or list.sort.

czech_sort.bytes_key(s)

Returns a sort key for a given string, as bytes.

This is suitable as a DB-API custom function like the built-in sqlite3 connection's create_function.

WARNING: Do not store the results of this function. The format can change in future versions of czech_sort.

Compatibility

The czech-sort library can be used with Python 2.6+ and 3.5+.

Under Python 2, it only accepts unicode strings.

Installation

Install this into your virtualenv by running:

$ pip install czech-sort

Contribute

Bug reports and comments are welcome at Github.

Patches are also welcome! Source code is hosted at Github:

$ git clone http://github.com/encukou/czech-sort

To run the included tests:

$ python -m pip install -e.[test]
$ python -m pytest

If you would like to contribute, but are confused by the above, then please e-mail encukou at gmail dot com.

License

The project is licensed under the MIT license. May it serve you well.

Changelog

1.1.0 (2023-07-11)

  • Add bytes_key (Thanks to @honzajavorek!)
  • Drop support for Python 2

1.0.1 (2021-08-30)

  • Fix bug that prevented sorting strings that contain 'Ł' and/or 'Ø'. (Thanks to @dark-light-cz for reporting and @jiri-one for the PR!)

1.0.0 (2020-09-14)

No code changes. Since this has been stable for five years I decided to call it 1.0.

  • Packaging improvements
  • Tested with Python 2.7 and 3.5-3.9

0.4 (2015-09-05)

  • First general release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

czech-sort-1.1.0.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

czech_sort-1.1.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file czech-sort-1.1.0.tar.gz.

File metadata

  • Download URL: czech-sort-1.1.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for czech-sort-1.1.0.tar.gz
Algorithm Hash digest
SHA256 fc8b5217bf065b99df4a006e7fc5326d1e584480945242b39721c941724f0ebe
MD5 c708635481b1549ba44f535d5276cd07
BLAKE2b-256 425417244d0a363626071f39a5e7d726226b9bc021e24d348fc4f30c75daa48f

See more details on using hashes here.

File details

Details for the file czech_sort-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: czech_sort-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for czech_sort-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c838dc8b39dd76ca95b0a01ad9f4b177bead96fa04bb52a06f558efd8a07a3c
MD5 2dd4af70f51d3316b92f26816e3f5b14
BLAKE2b-256 4180ab38870957c5a6c93f296ea51b5800542ae01441b67e46e5ac6ee76e5128

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page