Skip to main content

A helpful script to optimize a Pandas DataFrame.

Project description

pd-helper

A helpful package to streamline Pandas DataFrame optimization.

Save 50-75% on DataFrame memory usage by running the optimizer.

Autoconfigure dtypes for appropriate data types in each column with helper.

Generate a random DataFrame of controlled random variables for testing with maker.

Install

pip install pd-helper

Basic Usage to Iterate over DataFrame

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   df = optimize(df)

Better Usage With Multiprocessing

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   df = optimize(df, enable_mp=True)

Specify Special Mappings

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   special_mappings = {'string': ['object_id'],
                       'category': ['item_name']}

   # special mappings will be applied instead of by optimize ruleset, they will be returned.
   df = optimize(df
                 , enable_mp=True,
                 special_mappings=special_mappings
                 )

Sample Results with Helper

Starting with 175.63 MB memory.

After optmization. 

Ending with 65.33 MB memory.

Generating a Randomly Imperfect DataFrame with Maker

Maker provides a class, MakeData(), to generate a table of made-up records.

Each row is an event where an item was retrieved.

Options to make the table imperfectly random in various ways.

Sample table below:

Retrieved Date Item Name Retrieved Condition Sector
Example 2019-01-01, 2019-03-4 Toaster, Lighter True, False Junk, Excellent 1, 2
Data Type String String String String Integer

References

TODO

  • Improve efficiency of iterating on DataFrame.

  • Allow user to toggle logging.

  • Provide tools for imputing missing data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pd-helper, version 1.0.0
Filename, size File type Python version Upload date Hashes
Filename, size pd_helper-1.0.0-py2.py3-none-any.whl (12.9 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size pd_helper-1.0.0.tar.gz (11.6 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page