Skip to main content

A helpful script to optimize a Pandas DataFrame.

Project description

pd-helper

A helpful package to streamline Pandas DataFrame optimization.

Save 50-75% on DataFrame memory usage by running the optimizer.

Autoconfigure dtypes for appropriate data types in each column with helper.

Generate a random DataFrame of controlled random variables for testing with maker.

Install

pip install pd-helper

Basic Usage to Iterate over DataFrame

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   df = optimize(df)

Better Usage With Multiprocessing

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   df = optimize(df, enable_mp=True)

Specify Special Mappings

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   special_mappings = {'string': ['object_id'],
                       'category': ['item_name']}

   # special mappings will be applied instead of by optimize ruleset, they will be returned.
   df = optimize(df
                 , enable_mp=True,
                 special_mappings=special_mappings
                 )

Sample Results

Starting with 175.63 MB memory.

After optmization. 

Ending with 65.33 MB memory.

TODO

  • Improve efficiency of iterating on DataFrame.

  • Allow user to toggle logging.

  • Provide tools for imputing missing data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pd_helper-0.1.4.tar.gz (10.0 kB view hashes)

Uploaded Source

Built Distribution

pd_helper-0.1.4-py2.py3-none-any.whl (11.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page