Adaptive ultra-sort: auto-dispatching, correctness-first sorting that beats np.sort by a mile on the array fast path

These details have not been verified by PyPI

Project links

Project description

quill-sort

Adaptive Ultra-Sort — Quill auto-dispatches to the fastest correct sort for your data and your machine, and never loses to the baseline.

import numpy as np
import quill

# The list API: a drop-in for sorted() / list.sort(), ~3-4x faster on numeric.
quill.quill_sort([3, 1, 4, 1, 5, 9])              # → [1, 1, 3, 4, 5, 9]
quill.quill_sort(records, key=lambda r: r['age']) # sort objects by a key

# The array API: the "by a mile" path — dispatches to Rust radix / GPU / polars.
a = np.random.randint(0, 2**40, 20_000_000)
quill.sort_array(a)                                # ~5x faster than np.sort

The one thing to know: list API vs. array API

Quill has two doors, and which one you use determines how big the win is.

API	Input / output	Speed (numeric, this machine)	Why
`quill_sort(list)`	`list` in, `list` out	matches `np.sort`, ~3.8x vs `sorted()` / `list.sort()`	It does `np.asarray(...)` then `.tolist()` around a fast kernel. Those two conversions dominate the runtime (Amdahl's law), so the kernel speedup is mostly hidden — you get np.sort-class throughput, not the "by a mile" number.
`quill.sort_array(ndarray)`	`ndarray` in, `ndarray` out	~5.3x int64, ~3.3x float64, ~4x GPU vs `np.sort`	Zero conversions. It dispatches the raw buffer straight to the fastest installed backend. This is where Quill actually pulls away from numpy.

Plain version: if your data starts and ends as a Python list, the conversion wall caps the win — quill_sort is np.sort-fast and ~4x faster than the built-in sorted(), which is already a great deal, but it is not 5x. If you keep your data in numpy arrays and call sort_array, you get the full multi-threaded / GPU speedup. Don't believe any library that claims a 5x list-in / list-out sort — the conversion alone forbids it.

All numbers below are measured on this development box (Windows 11, 28-core CPU, RTX 4060 Ti, numpy 2.4.6) with a freshly-shuffled array on every timed run, so they reflect real cold-sort throughput, not warm/already-sorted artifacts. Your mileage will vary with CPU, GPU, array size, and dtype.

Installation

pip install quill-sort              # core — ZERO required deps, always correct
pip install quill-sort[fast]        # + numpy + psutil (the baseline fast path)
pip install quill-sort[polars]      # + polars  (no-compile multi-threaded sort)
pip install quill-sort[gpu]         # + cupy    (NVIDIA GPU sort)
pip install quill-sort[all]         # numpy + pandas + psutil + polars

The compiled Rust backend (the ~5x path) ships as a pre-built binary wheel for common platforms — pip install quill-sort just gets it, no Rust toolchain needed. If no wheel matches your platform, pip falls back to the pure-Python sdist and Quill transparently uses the next-fastest backend (GPU → polars → numpy → np.sort). The install never fails for lack of a compiler.

How it picks (the dispatch chain)

sort_array() profiles the array and walks a priority chain, taking the first backend that is installed and supports the array. Every step has a measured crossover (min_n) below which it isn't worth it, and any backend error falls straight back to np.sort — correctness is never at risk.

sort_array(ndarray)
        │
        ▼
  eligible?  (1-D, dtype kind i/u/f, itemsize ≤ 8, value-only, C-contiguous)
        │ no ──────────────────────────────────────────────► np.sort
        │ yes
        ▼
  dense bounded int64/uint64?  ──► counting sort (np.bincount)   ~1.7-2.8x, single-thread
        │ no
        ▼
  ┌──────────────────────────────────────────────────────────┐
  │  try, in priority order, the first available + supporting │
  │    rust_voracious   (compiled radix, n ≥ 1M)              │
  │    cupy_gpu         (GPU, n ≥ 2M, fits in free VRAM)      │
  │    polars           (no-compile MT sort, n ≥ 200k)        │
  │    numpy_parallel   (int-only partition sort, n ≥ 5M)     │
  └──────────────────────────────────────────────────────────┘
        │ any error / nothing eligible
        ▼
  np.sort   ← the floor; always correct, never slower than numpy

NaN is stripped before any backend runs and re-appended at the end (numpy convention), so a backend that can't order NaN never sees one. Descending is a post-sort reverse. quill.available_backends() shows what your machine will actually use, in this order.

Backends

Backend	How to get it	Measured (int64, vs `np.sort`)	Notes
`rust_voracious`	`pip install quill-sort` (bundled wheel)	~5.3x (float64 ~3.3x)	Compiled PyO3 + `voracious_radix_sort`, multi-threaded radix. Fastest measured path; leads the chain.
`cupy_gpu`	`pip install quill-sort[gpu]`	~4x (float64 ~2.5x)	GPU radix via CuPy. Counts the host→device→host round trip and still wins for large arrays that fit in VRAM.
`polars`	`pip install quill-sort[polars]`	~2.3x (float64 ~1.7x)	Delegates to polars' multi-threaded Rust sort. No compiler needed — the "fast for everyone" fallback.
`numpy_parallel`	`pip install quill-sort[fast]`	~1.1x (int only)	Thread-parallel `np.partition` sample-sort. Small but reliable int win; deliberately uses few workers (the gain saturates at memory bandwidth).
counting sort	`pip install quill-sort[fast]`	~1.7-2.8x	`np.bincount` for dense bounded int64/uint64. O(n+k), single-threaded, beats every comparison backend in its niche.
`np.sort`	always (with numpy)	1.0x (the floor)	The never-lose baseline. Reached on any error or ineligible dtype.

Without numpy at all, Quill still sorts correctly via the stdlib Timsort — it is simply slower. Correctness has zero required dependencies.

The array API — `sort_array()`

import numpy as np
import quill

a = np.random.randint(0, 2**40, 20_000_000)

s = quill.sort_array(a)                      # sorted COPY (a untouched), fastest backend
quill.sort_array(a, inplace=True)            # sort a IN PLACE, returns a
d = quill.sort_array(a, descending=True)     # reverse order

quill.available_backends()
# → ['rust_voracious', 'cupy_gpu', 'polars', 'numpy_parallel']  (priority order)

sort_array matches np.sort exactly — including negatives, mixed int/float promotion, and NaN-to-end — and is never slower than np.sort on any input.

Top-k — `quill_topk()`

When you only need the k smallest (or largest), don't sort the whole array. quill_topk uses numpy's argpartition (introselect, O(n)) instead of a full O(n log n) sort:

quill.quill_topk(scores, 10)                  # 10 smallest, sorted ascending
quill.quill_topk(scores, 10, largest=True)    # 10 largest,  sorted descending
quill.quill_topk(rows, 5, key=lambda r: r.size)

Measured ~8x faster than sorted(data, reverse=True)[:10] for k=10 over 10M elements on this machine.

`analyze()` — inspect the profile

quill.analyze([3, 1, 4, 1, 5, 9])
# {'n': 6, 'dtype': 'int_pos', 'presorted': False, 'dense': True, ...}

The list API — `quill_sort()` / `quill_sorted()`

A drop-in replacement for sorted() / list.sort() that routes numeric lists through the fast numeric kernel and everything else through Timsort. It returns a list, so it pays the conversion tax described above — expect np.sort-class speed and ~3-4x over the built-ins on numeric data, not the array-API number.

quill.quill_sort([3, 1, 4, 1, 5, 9])              # in-place, returns the list
quill.quill_sort(data, key=lambda x: x['score'])  # objects via key
quill.quill_sort(data, reverse=True)              # descending
quill.quill_sort(data, inplace=False)             # return a new list, leave input alone
quill.quill_sort(data, parallel=True)             # force multi-core
quill.quill_sort(data, stable=False)              # unstable, max speed on numeric
result = quill.quill_sorted(iterable)             # non-mutating, mirrors sorted()

quill.quill_sort(
    data,                        # list, generator, range, ndarray, Series, DataFrame
    key=None,                    # sort key function (like sorted())
    reverse=False,               # descending order
    inplace=True,                # mutate in place (False → return a new list)
    parallel=False,              # use multiple cores (auto on big numeric data)
    high_performance_mode=False, # skip the interactive prompt on the external-sort path
    silent=False,                # suppress status output
    stable=True,                 # True = matches sorted() exactly; False = max speed
    stats=False,                 # return (sorted_list, stats_dict)
)

`stable` parameter

stable=True (default) guarantees equal elements keep their original relative order — identical to Python's sorted(). stable=False allows the benchmark-optimal unstable kernels on numeric data for extra speed when order among equal elements doesn't matter.

quill.quill_sort(data)                 # stable, safe for all use cases
quill.quill_sort(data, stable=False)   # unstable, faster for large int32/int64

`stats` — timing

result, stats = quill.quill_sort(data, stats=True)
# stats = {'time_ms': 12.3, 'n': 1000000}

The never-lose guarantee

Every fast path in Quill is wrapped so that any failure — a missing backend, a GPU OOM, a compiled-extension panic, an unsupported dtype — falls back to np.sort (or stdlib Timsort if numpy is absent). The Rust extension is built with panic = "unwind", so even a native panic becomes a catchable Python exception and the process never aborts. You can never get a result that is wrong or a sort that is slower than the numpy baseline.

# To debug a backend instead of silently falling back, surface the error:
QUILL_BACKEND_DEBUG=1 python your_script.py

Correctness contract

Quill's output matches sorted() / np.sort on every supported input:

None values sort to the end (start with reverse=True):

quill.quill_sort([3, None, 1, None, 2])
# → [1, 2, 3, None, None]

NaN values in float data go to the end (start with reverse=True), matching numpy's convention:

quill.quill_sort([3.0, float('nan'), 1.0, 2.0])
# → [1.0, 2.0, 3.0, nan]

Mixed int/float lists are promoted to float64 and take the fast float path (not the slow object fallback):

quill.quill_sort([1, 2.5, 3, 4.0])
# → [1, 2.5, 3, 4.0]

Negative integers are handled natively (two's-complement radix; no object fallback), and keyed sorts are stable by default.

Supported types

int, float, str, bytes — native fast paths
Mixed int + float — promoted to float64
Negative integers — handled natively
None values — sorted to the end (start with reverse=True)
float('nan') — sorted to the end (start with reverse=True)
numpy.ndarray — sorted via the backend chain (use sort_array for the full win)
pandas.Series — sorted, returned as a new Series
pandas.DataFrame — sorted by column(s) via key='column_name'
Any generator / iterator — materialised to a list, sorted, returned

Plugin system

Teach Quill to sort your own types. A plugin declares which types it handles and a prepare() that converts them to something Quill sorts natively, plus an optional postprocess to rebuild your objects.

from quill import register_plugin, QuillPlugin

class MyPlugin(QuillPlugin):
    handles = (MyCustomClass,)
    name    = "my_custom_class"

    @staticmethod
    def prepare(data, key, reverse):
        items       = [x.value for x in data]
        postprocess = lambda sorted_vals: [MyCustomClass(v) for v in sorted_vals]
        return items, key, postprocess   # (list_to_sort, key_fn, postprocess_fn)

register_plugin(MyPlugin)
quill.quill_sort(list_of_my_objects)

Built-in plugins cover numpy.ndarray, pandas.Series, pandas.DataFrame, range, and any generator/iterator. You can also drop in a custom backend (not just a type plugin) with quill._backends.register_backend(...) if you have a faster kernel for some dtype.

CLI / demo / setup

python -m quill       # animated benchmark demo across data types
quill                 # same, if installed via pip
quill setup           # calibration wizard: measures the parallel/GPU crossovers
                      #   on YOUR box and persists them to ~/.quill/config.json

Quill also ships a small visualizer/wizard and add-on helpers layered on the core engine (the calibration wizard above, profile inspection via analyze(), and the quill_topk add-on). They are optional conveniences — the core sort API needs none of them.

How it works (the list path, in detail)

For a list, Quill profiles the data at intake and routes to the best strategy:

Data shape	Strategy	Complexity
Dense bounded integer range	`np.bincount` counting sort	O(n + k)
int64 / float64 (value-only)	fastest backend via the array chain	~O(n) radix / O(n log n)
Mixed int/float	promote to float64, then numeric path	O(n log n)
Strings / bytes	Python Timsort	O(n log n)
Objects with a `key`	extract keys → numpy argsort (stable)	O(n log n)
Nearly sorted (asc_ratio > 0.90)	Timsort bypass	~O(n)
Larger than RAM	external radix-partition + parallel bucket sort	disk-bound

The conversion to/from numpy is the fixed cost that caps the list-path speedup — see the very first table. For maximum throughput, keep your data in arrays and use sort_array.

Performance tuning

Install the right extra. [fast] (numpy + psutil) is the baseline; add [polars] for a no-compile multi-threaded sort, or [gpu] if you have an NVIDIA card. The bundled Rust wheel already gives you the top path.
Use sort_array on arrays. It skips the conversion wall entirely.
stable=False on the list path for extra speed on numeric data when you don't need stable ordering.
quill setup measures your machine's crossovers so auto-parallel and GPU engage exactly where they win.
Install psutil for accurate RAM sensing (avoids spurious external-sort triggers); without it, Quill assumes 2 GB available.

Troubleshooting

"quill_sort(list) isn't 5x faster." It can't be — the asarray/tolist round trip dominates (Amdahl). It is np.sort-fast and ~3-4x over sorted(). For the big win, keep arrays and call sort_array.
"sort_array isn't using the GPU / Rust path." Check quill.available_backends(). The backend may not be installed, the array may be below its crossover (min_n), or the dtype/itemsize may be ineligible. Set QUILL_BACKEND_DEBUG=1 to surface the real reason instead of falling back silently.
"GPU path errors on a huge array." It only engages when the array fits in free VRAM with headroom; otherwise it falls back to the CPU radix or np.sort.
"Equal elements reordered." Use stable=True (the default). stable=False is explicitly allowed to reorder equal elements.
"External sort triggered unexpectedly." Install psutil for accurate RAM sensing; pass high_performance_mode=True to skip the interactive prompt.

Development

git clone https://github.com/invariant-games/quill.git
cd quill
pip install -e ".[all,dev]"
pytest                          # full suite
pytest -m slow                  # parallel / large-data tests
python -m quill                 # animated benchmark demo

# Build the compiled Rust backend locally (optional; CI ships wheels):
cd rustext && maturin develop --release

The CI workflow .github/workflows/wheels.yml builds the rustext crate into abi3 binary wheels (one wheel per platform covers Python ≥ 3.8) for Windows, manylinux x86_64/aarch64, and macOS x86_64/arm64, plus a pure-Python sdist as the never-fail-install fallback.

Requirements

Python 3.8+
numpy — optional but strongly recommended (pip install quill-sort[fast])
psutil — optional, for accurate RAM sensing
polars — optional backend (pip install quill-sort[polars])
cupy — optional GPU backend (pip install quill-sort[gpu], NVIDIA only)
pandas — optional (Series/DataFrame support)

A C/Rust toolchain is not required to install — the compiled backend is shipped as a pre-built wheel, with a pure-Python sdist fallback.

License

MIT — by Isaiah Tucker

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

7.1.3

Jun 27, 2026

7.1.2

Jun 27, 2026

7.1.1

Jun 27, 2026

7.1.0

Jun 27, 2026

7.0.7

Jun 27, 2026

7.0.6

Jun 27, 2026

7.0.5

Jun 27, 2026

7.0.4

Jun 27, 2026

7.0.3

Jun 27, 2026

7.0.1

Jun 25, 2026

7.0.0

Jun 25, 2026

6.0.18

Jun 16, 2026

6.0.17

Jun 16, 2026

6.0.16

Jun 16, 2026

6.0.15

Jun 16, 2026

6.0.14

Jun 16, 2026

6.0.13

Jun 16, 2026

6.0.12

Jun 16, 2026

6.0.11

Jun 16, 2026

6.0.10

Jun 16, 2026

6.0.9

Jun 16, 2026

6.0.8

Jun 16, 2026

6.0.7

Jun 16, 2026

6.0.6

Jun 16, 2026

6.0.5

Jun 16, 2026

This version

6.0.4

Jun 16, 2026

5.0.0

Apr 13, 2026

4.0.23

Apr 1, 2026

4.0.22

Apr 1, 2026

4.0.21

Apr 1, 2026

4.0.20

Apr 1, 2026

4.0.19

Apr 1, 2026

4.0.18

Apr 1, 2026

4.0.17

Apr 1, 2026

4.0.16

Apr 1, 2026

4.0.15

Apr 1, 2026

4.0.14

Apr 1, 2026

4.0.13

Apr 1, 2026

4.0.12

Apr 1, 2026

4.0.11

Apr 1, 2026

4.0.10

Apr 1, 2026

4.0.9

Apr 1, 2026

4.0.8

Apr 1, 2026

4.0.7

Apr 1, 2026

4.0.6

Apr 1, 2026

4.0.5

Apr 1, 2026

4.0.4

Apr 1, 2026

4.0.3

Apr 1, 2026

4.0.2

Apr 1, 2026

4.0.0

Mar 31, 2026

3.3.6

Mar 31, 2026

3.3.5

Mar 31, 2026

3.3.4

Mar 31, 2026

3.3.3

Mar 31, 2026

3.3.2

Mar 31, 2026

3.3.1

Mar 31, 2026

3.3.0

Mar 31, 2026

3.2.9

Mar 31, 2026

3.2.8

Mar 31, 2026

3.2.7

Mar 31, 2026

3.2.6

Mar 31, 2026

3.2.5

Mar 31, 2026

3.2.4

Mar 31, 2026

3.2.3

Mar 31, 2026

3.2.2

Mar 31, 2026

3.2.1

Mar 31, 2026

3.2.0

Mar 31, 2026

3.1.5

Mar 31, 2026

3.1.4

Mar 31, 2026

3.1.3

Mar 31, 2026

3.1.2

Mar 31, 2026

3.1.1

Mar 31, 2026

3.1.0

Mar 31, 2026

3.0.7

Mar 31, 2026

3.0.6

Mar 31, 2026

3.0.5

Mar 30, 2026

3.0.4

Mar 30, 2026

3.0.3

Mar 30, 2026

3.0.2

Mar 30, 2026

3.0.1

Mar 30, 2026

3.0.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quill_sort-6.0.4.tar.gz (89.5 kB view details)

Uploaded Jun 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quill_sort-6.0.4-py3-none-any.whl (88.0 kB view details)

Uploaded Jun 16, 2026 Python 3

File details

Details for the file quill_sort-6.0.4.tar.gz.

File metadata

Download URL: quill_sort-6.0.4.tar.gz
Upload date: Jun 16, 2026
Size: 89.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for quill_sort-6.0.4.tar.gz
Algorithm	Hash digest
SHA256	`8136591d47a4ec6bde955d4f60abd19be27b4ee2b7fe176b7ee0c4abd04da497`
MD5	`8e9090dcb564bfa116ea8c2c3d9108b1`
BLAKE2b-256	`c5effad4ad3f160de042b9aeaff4acb0927ac285f8d32e0918f0cfc187aef73c`

See more details on using hashes here.

File details

Details for the file quill_sort-6.0.4-py3-none-any.whl.

File metadata

Download URL: quill_sort-6.0.4-py3-none-any.whl
Upload date: Jun 16, 2026
Size: 88.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for quill_sort-6.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8b7fb63b6e71e1390354e8bdc51542ed5397e8b70814c915b7be8eced0dd132`
MD5	`67a574369575cac6eccc0156187e44ec`
BLAKE2b-256	`199c352ed1edce83b7859119a16d9c175108ad7ca58a45fcf5e370c2d0c16941`

See more details on using hashes here.

quill-sort 6.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

quill-sort

The one thing to know: list API vs. array API

Installation

How it picks (the dispatch chain)

Backends

The array API — sort_array()

Top-k — quill_topk()

analyze() — inspect the profile

The list API — quill_sort() / quill_sorted()

stable parameter

stats — timing

The never-lose guarantee

Correctness contract

Supported types

Plugin system

CLI / demo / setup

How it works (the list path, in detail)

Performance tuning

Troubleshooting

Development

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The array API — `sort_array()`

Top-k — `quill_topk()`

`analyze()` — inspect the profile

The list API — `quill_sort()` / `quill_sorted()`

`stable` parameter

`stats` — timing