Skip to main content

A helpful script to optimize a Pandas DataFrame.

Project description

pd-helper

A helpful package to streamline Pandas DataFrame optimization.

Save 50-75% on DataFrame memory usage by running the optimizer.

Auto configure dtypes for appropriate data types in each column.

Basic Usage to Iterate over DataFrame

from pd_helper.helper import optimize

if __name__ == "__main__":
   # some DataFrame, df
   df = optimize(df)

Better Usage With Multiprocessing

from pd_helper.helper import optimize

if __name__ == "__main__":
   # some DataFrame, df
   df = optimize(df, enable_mp=True)

Install

pip install pd-helper

Sample Results

Starting with 175.63 MB memory.

After optmization. 

Ending with 65.33 MB memory.

TODO

  • Improve efficiency of iterating on DataFrame.

  • Allow user to toggle logging.

  • Provide tools for imputing missing data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pd_helper-0.1.1.tar.gz (5.9 kB view hashes)

Uploaded Source

Built Distribution

pd_helper-0.1.1-py2.py3-none-any.whl (6.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page