Skip to main content

NoneType Safe Multi Column Sorting For Python

Project description

multisort - NoneType Safe Multi Column Sorting For Python

Simplified multi-column sorting of lists of tuples, dicts, lists or objects that are NoneType safe.

Installation

python3 -m pip install multisort

Dependencies

None

Performance

Average over 10 iterations with 1000 rows.

Test Secs
superfast 0.0005
multisort 0.0035
pandas 0.0079
cmp_func 0.0138
reversor 0.037

Hands down the fastest is the superfast methdology shown below. You do not need this library to accomplish this as its just core python.

multisort from this library gives reasonable performance for large data sets; eg. its better than pandas up to about 5,500 records. It is also much simpler to read and write, and it has error handling that does its best to give useful error messages.

Note on NoneType and sorting

If your data may contain None, it would be wise to ensure your sort algorithm is tuned to handle them. This is because sorted uses < comparisons; which is not supported by NoneType. For example, the following error will result: TypeError: '>' not supported between instances of 'NoneType' and 'str'. All examples given on this page are tuned to handle None values.

Methodologies

Method Descr Notes
multisort Simple one-liner designed after multisort example from python docs Second fastest of the bunch but most configurable and easy to read.
cmp_func Multi column sorting in the model java.util.Comparator Reasonable speed
superfast NoneType safe sample implementation of multi column sorting as mentioned in example from python docs Fastest by orders of magnitude but a bit more complex to write.

Dictionary Examples

For data:

rows_before = [
     {'idx': 0, 'name': 'joh', 'grade': 'C', 'attend': 100}
    ,{'idx': 1, 'name': 'jan', 'grade': 'a', 'attend': 80}
    ,{'idx': 2, 'name': 'dav', 'grade': 'B', 'attend': 85}
    ,{'idx': 3, 'name': 'bob' , 'grade': 'C', 'attend': 85}
    ,{'idx': 4, 'name': 'jim' , 'grade': 'F', 'attend': 55}
    ,{'idx': 5, 'name': 'joe' , 'grade': None, 'attend': 55}
]

multisort

Sort rows_before by grade, descending, then attend, ascending and put None first in results:

from multisort import multisort, mscol
rows_sorted = multisort(rows_before, [
        mscol('grade', reverse=False),
        'attend'
])

-or- without mscol

from multisort import multisort
rows_sorted = multisort(rows_before, [
        ('grade', False),
        'attend'
])

Sort rows_before by grade, descending, then attend and call upper() for grade:

from multisort import multisort, mscol
rows_sorted = multisort(rows_before, [
        mscol('grade', reverse=False, clean=lambda s: None if s is None else s.upper()),
        'attend'
])

-or- without mscol

from multisort import multisort
rows_sorted = multisort(rows_before, [
        ('grade', False, lambda s: None if s is None else s.upper()),
        'attend'
])

multisort parameters:

option dtype description
rows int or str Key to access data. int for tuple or list
spec str, int, list Sort specification. Can be as simple as a column key / index or mscol
reverse bool Reverse order of final sort (defalt = False)

spec entry options:

option position dtype description
key 0 int or str Key to access data. int for tuple or list
reverse 1 bool Reverse sort of column
clean 2 func Function / lambda to clean the value. These calls can cause a significant slowdown.
default 3 any Value to substitute if required==False and key does not exist or None is found. Can be used to achive similar functionality to pandas na_position
required 4 bool Default True. If False, will substitute None or default if key not found (not applicable for list or tuple rows)

* spec entries can be passed as:

    type description
    String Column name
    tuple Tuple of 1 or more spec options in their order as listed (see position)
    mscol() Importable helper to aid in readability. Suggested for three or more of the options.



sorted with cmp_func

Sort rows_before by grade, descending, then attend and call upper() for grade:

def cmp_student(a,b):
    k='grade'; va=a[k]; vb=b[k]
    if va != vb: 
        if va is None: return -1
        if vb is None: return 1
        return -1 if va > vb else 1
    k='attend'; va=a[k]; vb=b[k]; 
    if va != vb: return -1 if va < vb else 1
    return 0
rows_sorted = sorted(rows_before, key=cmp_func(cmp_student), reverse=True)

For reference: superfast methodology with list of dicts:

def key_grade(student):
    grade = student['grade']
    return grade is None, grade
def key_attend(student):
    attend = student['attend']
    return attend is None, attend
students_sorted = sorted(students, key=key_attend)
students_sorted.sort(key=key_grade, reverse=True)

Object Examples

For data:

class Student():
    def __init__(self, idx, name, grade, attend):
        self.idx = idx
        self.name = name
        self.grade = grade
        self.attend = attend
    def __str__(self): return f"name: {self.name}, grade: {self.grade}, attend: {self.attend}"
    def __repr__(self): return self.__str__()

rows_before = [
     Student(0, 'joh', 'C', 100)
    ,Student(1, 'jan', 'a', 80)
    ,Student(2, 'dav', 'B', 85)
    ,Student(3, 'bob', 'C', 85)
    ,Student(4, 'jim', 'F', 55)
    ,Student(5, 'joe', None, 55)
]

multisort

(Same syntax as with Dictionary example above)


sorted with cmp_func

Sort rows_before by grade, descending, then attend and call upper() for grade:

def cmp_student(a,b):
    if a.grade != b.grade: 
        if a.grade is None: return -1
        if b.grade is None: return 1
        return -1 if a.grade > b.grade else 1
    if a.attend != b.attend: 
        return -1 if a.attend < b.attend else 1
    return 0
rows_sorted = sorted(rows_before, key=cmp_func(cmp_student), reverse=True)

List / Tuple Examples

For data:

rows_before = [
     (0, 'joh', 'a'  , 100)
    ,(1, 'joe', 'B'  , 80)
    ,(2, 'dav', 'A'  , 85)
    ,(3, 'bob', 'C'  , 85)
    ,(4, 'jim', None , 55)
    ,(5, 'jan', 'B'  , 70)
]
(COL_IDX, COL_NAME, COL_GRADE, COL_ATTEND) = range(0,4)

multisort

(Same syntax as with Dictionary example above)


sorted with cmp_func

Sort rows_before by grade, descending, then attend and call upper() for grade:

def cmp_student(a,b):
    k=COL_GRADE; va=a[k]; vb=b[k]
    if va != vb: 
        if va is None: return -1
        if vb is None: return 1
        return -1 if va > vb else 1
    k=COL_ATTEND; va=a[k]; vb=b[k]; 
    if va != vb: 
        return -1 if va < vb else 1
    return 0
rows_sorted = sorted(rows_before, key=cmp_func(cmp_student), reverse=True)



Basic sorting

multisort can be used as a basic non-destructive sorter of lists where a traditional sort does so destructively:

_orig = [1, 4, 3, 6, 5]
_orig.sort(reverse=True)

This will sort _orig in-place

In builtin python, to do a non-destructive sort it takes two lines:

_orig = [1, 4, 3, 6, 5]
_clone = [:]
_clone.sort(reverse=True)

With Multisort its just one line:

_orig = [1, 4, 3, 6, 5]
_sorted = multisort(_orig, reverse=True)

Where _orig is left unchanged


multisort library Test / Sample files (/tests)

Name Descr Other
tests/test_multisort.py multisort unit tests -
tests/performance_tests.py Tunable performance tests using asyncio requires pandas
tests/hand_test.py Hand testing -

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multisort-0.1.5.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

multisort-0.1.5-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file multisort-0.1.5.tar.gz.

File metadata

  • Download URL: multisort-0.1.5.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.2 Linux/5.15.0-1045-oracle

File hashes

Hashes for multisort-0.1.5.tar.gz
Algorithm Hash digest
SHA256 6f4f314722367e9e52c4fbb2520bc9ca5d11272835882966cbdc05089e2273dc
MD5 9e513572d90685f0ef923d84b0afd84b
BLAKE2b-256 8f82f616854edc8a992ee4ab41007e9d250b577fb647f1d32b9d59b974d83256

See more details on using hashes here.

File details

Details for the file multisort-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: multisort-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.2 Linux/5.15.0-1045-oracle

File hashes

Hashes for multisort-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 01b816ef4eedd2eeec2c82d78efe43e191cc559deca45af8598e1639c57fde52
MD5 bed94ad48b5eeb3f7cce8a6803b455de
BLAKE2b-256 9627dcb94ab8e964970c9c67b028f311790ce59cd4d8457bf45c7500f52a6829

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page