Skip to main content

Speedup up to 40 percent when sorting Pandas index/Series

Project description

Speedup up to 40 percent when sorting Pandas index/Series

MSVC C++ x64/x86 build tools must be installed.

This module uses https://pypi.org/project/npfastsortcpp/

There you can get all instructions

Important: Only for float/int

Tested against Windows 10 / Python 3.9.13

import pandas as pd

from a_pandas_ex_fastsort import pd_add_fastsort

pd_add_fastsort()

dafra = "https://github.com/pandas-dev/pandas/raw/main/doc/data/titanic.csv"

df5 = pd.read_csv(dafra)
# Speed gain even for small DataFrames

df = pd.concat([df5.copy() for x in range(10)], ignore_index=True)

df = df.sample(len(df))

%timeit df.d_fast_reindex() # Values must be unique

846 µs ± 37.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit df.sort_index()

933 µs ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# The bigger, the better

df = pd.concat([df5.copy() for x in range(100)], ignore_index=True)

df = df.sample(len(df))

%timeit df.d_fast_reindex() # Values must be unique

11.1 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.sort_index()

15 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
df = pd.concat([df5.copy() for x in range(100)], ignore_index=True)

df = df.sample(len(df))

%timeit df.Pclass.sort_values()

2.08 ms ± 66 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.Pclass.s_fastsort_copy() # Be careful: original index will be dropped!

583 µs ± 5.85 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# Be careful: 

df.Pclass.s_fastsort_inplace()

# sorts only one Series in place, 

# values in other columns are not being sorted! 



df # starting with:

Out[19]: 

       PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked

34102          245         0       3  ...    7.2250   NaN         C

28329          709         1       1  ...  151.5500   NaN         S

50018          123         0       2  ...   30.0708   NaN         C

51258          472         0       3  ...    8.6625   NaN         S

51813          136         0       2  ...   15.0458   NaN         C

            ...       ...     ...  ...       ...   ...       ...

36357          718         1       2  ...   10.5000  E101         S

78608          201         0       3  ...    9.5000   NaN         S

64989          838         0       3  ...    8.0500   NaN         S

20824          332         0       1  ...   28.5000  C124         S

21108          616         1       2  ...   65.0000   NaN         S

[89100 rows x 12 columns]

df.Pclass.s_fastsort_inplace()



df # Result - Only Pclass has been sorted

Out[21]: 

       PassengerId  Survived  Pclass  ...      Fare Cabin  Embarked

34102          245         0       1  ...    7.2250   NaN         C

28329          709         1       1  ...  151.5500   NaN         S

50018          123         0       1  ...   30.0708   NaN         C

51258          472         0       1  ...    8.6625   NaN         S

51813          136         0       1  ...   15.0458   NaN         C

            ...       ...     ...  ...       ...   ...       ...

36357          718         1       3  ...   10.5000  E101         S

78608          201         0       3  ...    9.5000   NaN         S

64989          838         0       3  ...    8.0500   NaN         S

20824          332         0       3  ...   28.5000  C124         S

21108          616         1       3  ...   65.0000   NaN         S

[89100 rows x 12 columns]

Project details


Release history Release notifications | RSS feed

This version

0.10

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a_pandas_ex_fastsort-0.10.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

a_pandas_ex_fastsort-0.10-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file a_pandas_ex_fastsort-0.10.tar.gz.

File metadata

  • Download URL: a_pandas_ex_fastsort-0.10.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for a_pandas_ex_fastsort-0.10.tar.gz
Algorithm Hash digest
SHA256 6bf7a6a4a318c57c45088ee0c7482de7df3cc1788233eba6f2f20e84c389453b
MD5 cf039c9304204e75e93ab4f451ce9615
BLAKE2b-256 a092eb952023690a258ee09606360a95a02b48cf2f204c27a4d6aa54a801f789

See more details on using hashes here.

File details

Details for the file a_pandas_ex_fastsort-0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for a_pandas_ex_fastsort-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 a6eb6b07b70c0e1187e7812acafa7a56421ea08e9a52d57e5b017ec69814d104
MD5 7734dad9cdb33a190524d4c7081b9e2d
BLAKE2b-256 e0cd014f9b64a4d7464a37206e3a3f47ed09c40e549b0df68aa29d5b6b61f77b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page