Speedup up to 40 percent when sorting Pandas index/Series
Project description
Speedup up to 40 percent when sorting Pandas index/Series
MSVC C++ x64/x86 build tools must be installed.
This module uses https://pypi.org/project/npfastsortcpp/
There you can get all instructions
Important: Only for float/int
Tested against Windows 10 / Python 3.9.13
import pandas as pd
from a_pandas_ex_fastsort import pd_add_fastsort
pd_add_fastsort()
dafra = "https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv"
df5 = pd.read_csv(dafra)
# Speed gain even for small DataFrames
df = pd.concat([df5.copy() for x in range(10)], ignore_index=True)
df = df.sample(len(df))
%timeit df.d_fast_reindex() # Values must be unique
846 µs ± 37.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit df.sort_index()
933 µs ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# The bigger, the better
df = pd.concat([df5.copy() for x in range(100)], ignore_index=True)
df = df.sample(len(df))
%timeit df.d_fast_reindex() # Values must be unique
11.1 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.sort_index()
15 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
df = pd.concat([df5.copy() for x in range(100)], ignore_index=True)
df = df.sample(len(df))
%timeit df.Pclass.sort_values()
2.08 ms ± 66 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.Pclass.s_fastsort_copy() # Be careful: original index will be dropped!
583 µs ± 5.85 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# Be careful:
df.Pclass.s_fastsort_inplace()
# sorts only one Series in place,
# values in other columns are not being sorted!
df # starting with:
Out[19]:
PassengerId Survived Pclass ... Fare Cabin Embarked
34102 245 0 3 ... 7.2250 NaN C
28329 709 1 1 ... 151.5500 NaN S
50018 123 0 2 ... 30.0708 NaN C
51258 472 0 3 ... 8.6625 NaN S
51813 136 0 2 ... 15.0458 NaN C
... ... ... ... ... ... ...
36357 718 1 2 ... 10.5000 E101 S
78608 201 0 3 ... 9.5000 NaN S
64989 838 0 3 ... 8.0500 NaN S
20824 332 0 1 ... 28.5000 C124 S
21108 616 1 2 ... 65.0000 NaN S
[89100 rows x 12 columns]
df.Pclass.s_fastsort_inplace()
df # Result - Only Pclass has been sorted
Out[21]:
PassengerId Survived Pclass ... Fare Cabin Embarked
34102 245 0 1 ... 7.2250 NaN C
28329 709 1 1 ... 151.5500 NaN S
50018 123 0 1 ... 30.0708 NaN C
51258 472 0 1 ... 8.6625 NaN S
51813 136 0 1 ... 15.0458 NaN C
... ... ... ... ... ... ...
36357 718 1 3 ... 10.5000 E101 S
78608 201 0 3 ... 9.5000 NaN S
64989 838 0 3 ... 8.0500 NaN S
20824 332 0 3 ... 28.5000 C124 S
21108 616 1 3 ... 65.0000 NaN S
[89100 rows x 12 columns]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for a_pandas_ex_fastsort-0.10.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bf7a6a4a318c57c45088ee0c7482de7df3cc1788233eba6f2f20e84c389453b |
|
MD5 | cf039c9304204e75e93ab4f451ce9615 |
|
BLAKE2b-256 | a092eb952023690a258ee09606360a95a02b48cf2f204c27a4d6aa54a801f789 |
Close
Hashes for a_pandas_ex_fastsort-0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6eb6b07b70c0e1187e7812acafa7a56421ea08e9a52d57e5b017ec69814d104 |
|
MD5 | 7734dad9cdb33a190524d4c7081b9e2d |
|
BLAKE2b-256 | e0cd014f9b64a4d7464a37206e3a3f47ed09c40e549b0df68aa29d5b6b61f77b |