Speedup up to 40 percent when sorting Pandas index/Series
Project description
Speedup up to 40 percent when sorting Pandas index/Series
MSVC C++ x64/x86 build tools must be installed.
This module uses https://pypi.org/project/npfastsortcpp/
There you can get all instructions
Important: Only for float/int
Tested against Windows 10 / Python 3.9.13
import pandas as pd
from a_pandas_ex_fastsort import pd_add_fastsort
pd_add_fastsort()
dafra = "https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv"
df5 = pd.read_csv(dafra)
# Speed gain even for small DataFrames
df = pd.concat([df5.copy() for x in range(10)], ignore_index=True)
df = df.sample(len(df))
%timeit df.d_fast_reindex() # Values must be unique
846 µs ± 37.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit df.sort_index()
933 µs ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# The bigger, the better
df = pd.concat([df5.copy() for x in range(100)], ignore_index=True)
df = df.sample(len(df))
%timeit df.d_fast_reindex() # Values must be unique
11.1 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.sort_index()
15 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
df = pd.concat([df5.copy() for x in range(100)], ignore_index=True)
df = df.sample(len(df))
%timeit df.Pclass.sort_values()
2.08 ms ± 66 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df.Pclass.s_fastsort_copy() # Be careful: original index will be dropped!
583 µs ± 5.85 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# Be careful:
df.Pclass.s_fastsort_inplace()
# sorts only one Series in place,
# values in other columns are not being sorted!
df # starting with:
Out[19]:
PassengerId Survived Pclass ... Fare Cabin Embarked
34102 245 0 3 ... 7.2250 NaN C
28329 709 1 1 ... 151.5500 NaN S
50018 123 0 2 ... 30.0708 NaN C
51258 472 0 3 ... 8.6625 NaN S
51813 136 0 2 ... 15.0458 NaN C
... ... ... ... ... ... ...
36357 718 1 2 ... 10.5000 E101 S
78608 201 0 3 ... 9.5000 NaN S
64989 838 0 3 ... 8.0500 NaN S
20824 332 0 1 ... 28.5000 C124 S
21108 616 1 2 ... 65.0000 NaN S
[89100 rows x 12 columns]
df.Pclass.s_fastsort_inplace()
df # Result - Only Pclass has been sorted
Out[21]:
PassengerId Survived Pclass ... Fare Cabin Embarked
34102 245 0 1 ... 7.2250 NaN C
28329 709 1 1 ... 151.5500 NaN S
50018 123 0 1 ... 30.0708 NaN C
51258 472 0 1 ... 8.6625 NaN S
51813 136 0 1 ... 15.0458 NaN C
... ... ... ... ... ... ...
36357 718 1 3 ... 10.5000 E101 S
78608 201 0 3 ... 9.5000 NaN S
64989 838 0 3 ... 8.0500 NaN S
20824 332 0 3 ... 28.5000 C124 S
21108 616 1 3 ... 65.0000 NaN S
[89100 rows x 12 columns]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file a_pandas_ex_fastsort-0.10.tar.gz
.
File metadata
- Download URL: a_pandas_ex_fastsort-0.10.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bf7a6a4a318c57c45088ee0c7482de7df3cc1788233eba6f2f20e84c389453b |
|
MD5 | cf039c9304204e75e93ab4f451ce9615 |
|
BLAKE2b-256 | a092eb952023690a258ee09606360a95a02b48cf2f204c27a4d6aa54a801f789 |
File details
Details for the file a_pandas_ex_fastsort-0.10-py3-none-any.whl
.
File metadata
- Download URL: a_pandas_ex_fastsort-0.10-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6eb6b07b70c0e1187e7812acafa7a56421ea08e9a52d57e5b017ec69814d104 |
|
MD5 | 7734dad9cdb33a190524d4c7081b9e2d |
|
BLAKE2b-256 | e0cd014f9b64a4d7464a37206e3a3f47ed09c40e549b0df68aa29d5b6b61f77b |