Skip to main content

A helpful script to optimize a Pandas DataFrame.

Project description

pd-helper

A helpful package to streamline Pandas DataFrame optimization.

Save 50-75% on DataFrame memory usage by running the optimizer.

Autoconfigure dtypes for appropriate data types in each column with helper.

Generate a random DataFrame of controlled random variables for testing with maker.

Install

pip install pd-helper

Basic Usage to Iterate over DataFrame

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   df = optimize(df)

Better Usage With Multiprocessing

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   df = optimize(df, enable_mp=True)

Specify Special Mappings

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   special_mappings = {'string': ['object_id'],
                       'category': ['item_name']}

   # special mappings will be applied instead of by optimize ruleset, they will be returned.
   df = optimize(df
                 , enable_mp=True,
                 special_mappings=special_mappings
                 )

Sample Results with Helper

Starting with 175.63 MB memory.

After optmization. 

Ending with 65.33 MB memory.

Generating a Randomly Imperfect DataFrame with Maker

Maker provides a class, MakeData(), to generate a table of made-up records.

Each row is an event where an item was retrieved.

Options to make the table imperfectly random in various ways.

Sample table below:

Retrieved Date Item Name Retrieved Condition Sector
Example 2019-01-01, 2019-03-4 Toaster, Lighter True, False Junk, Excellent 1, 2
Data Type String String String String Integer

References

TODO

  • Improve efficiency of iterating on DataFrame.

  • Allow user to toggle logging.

  • Provide tools for imputing missing data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pd_helper-1.0.0.tar.gz (11.6 kB view hashes)

Uploaded Source

Built Distribution

pd_helper-1.0.0-py2.py3-none-any.whl (12.9 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page