Skip to main content

A helper class that makes appending to a Pandas DataFrame efficient

Project description

pandas-appender

Build Status Coverage Status Apache License 2.0

Have you ever wanted to append a bunch of rows to a Pandas DataFrame? Turns out that it's extremely inefficient to do so for a large dataframe, you're supposed to make multiple dataframes and pd.concat them instead.

So... helper function? Pandas doesn't seem to have one. Roll your own? OK then. Here's that helper function. It can append around 1 million very small rows per cpu-second, and has a modest additional memory usage of around 5 megabytes, dynamically growing with the number of rows appended.

Install

pip install pandas-appender

Usage

from pandas_appender import DF_Appender

dfa = DF_appender(ignore_index=True)  # note that ignore_index moves to the init
for i in range(1_000_000):
    dfa = dfa.append({'i': i})

df = dfa.finalize()

Type hints and category detection

Using narrower types and categories can often dramatically reduce the size of a DataFrame. There are two ways to do this in pandas-appender. One is to append to an existing dataframe:

dfa = DF_appender(df, ignore_index=True)

and the second is to pass in a dtypes= argument:

dfa = DF_appender(ignore_index=True, dtypes=another_dataframe.dtypes)

pandas-appender also offers a way to infer which columns would be smaller if they were categories. This code will either analyze an existing dataframe that you're appending to:

dfa = DF_appender(df, ignore_index=True, infer_categories=True)

or it will analyze the first chunk of appended lines:

dfa = DF_appender(ignore_index=True, infer_categories=True)

These inferred categories will override existing types or a dtypes= argument.

Incompatibilities with pandas.DataFrame.append()

pandas.DataFame.append is idempotent, DF_Appender is not

  • Pandas: df_new = df.append() # df is not changed
  • DF_Appender: dfa_new = dfa.append() # modifies dfa, and dfa_new == dfa

pandas.DataFrame.append will promote types, while DF_Appender is strict

  • Pandas: append 0.1 to an integer column, and the column will be promoted to float
  • DF_Appender: when initialized with dtypes= or an existing DataFrame, appending 0.1 to an integer column causes 0.1 to be cast to an integer, i.e. 0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_appender-0.9.4.tar.gz (13.0 kB view details)

Uploaded Source

File details

Details for the file pandas_appender-0.9.4.tar.gz.

File metadata

  • Download URL: pandas_appender-0.9.4.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.8.0 tqdm/4.19.5 CPython/3.6.4

File hashes

Hashes for pandas_appender-0.9.4.tar.gz
Algorithm Hash digest
SHA256 ce65e8a0adcfdb7e467809ab2cbe2723970015602847df6311339f99bb5045ca
MD5 1c2120751302491cb3fc728bfa136b04
BLAKE2b-256 3830460e7fe8345808a3633a17e12305089b998b7442e9f441835835e2a65bc7

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page