Skip to main content

A helpful script to optimize a Pandas DataFrame.

Project description

pd-helper

A helpful package to streamline Pandas DataFrame optimization.

Save 50-75% on DataFrame memory usage by running the optimizer.

Autoconfigure dtypes for appropriate data types in each column with helper.

Generate a random DataFrame of controlled random variables for testing with maker.

Install

pip install pd-helper

Basic Usage to Iterate over DataFrame

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   df = optimize(df)

Better Usage With Multiprocessing

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   df = optimize(df, enable_mp=True)

Specify Special Mappings

from pd_helper.maker import MakeData 
from pd_helper.helper import optimize
faker = MakeData()

if __name__ == "__main__":
   # MakeData() generates a fake dataframe, convenient for testing
   df = faker.make_df()
   special_mappings = {'string': ['object_id'],
                       'category': ['item_name']}

   # special mappings will be applied instead of by optimize ruleset, they will be returned.
   df = optimize(df
                 , enable_mp=True,
                 special_mappings=special_mappings
                 )

Sample Results with Helper

Starting with 175.63 MB memory.

After optmization. 

Ending with 65.33 MB memory.

Generating a Randomly Imperfect DataFrame with Maker

Maker provides a class, MakeData(), to generate a table of made-up records.

Each row is an event where an item was retrieved.

Options to make the table imperfectly random in various ways.

Sample table below:

Retrieved Date Item Name Retrieved Condition Sector
Example 2019-01-01, 2019-03-4 Toaster, Lighter True, False Junk, Excellent 1, 2
Data Type String String String String Integer

References

TODO

  • Improve efficiency of iterating on DataFrame.

  • Allow user to toggle logging.

  • Provide tools for imputing missing data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pd_helper-1.0.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

pd_helper-1.0.0-py2.py3-none-any.whl (12.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pd_helper-1.0.0.tar.gz.

File metadata

  • Download URL: pd_helper-1.0.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for pd_helper-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b5c0e29c24beea2c1fa0b753ad2c7553dab1e167694a003c3335ec83affb572f
MD5 94d0e1ee5ebbcec038bfd5adfc91ec97
BLAKE2b-256 9190e3db69d9c398cecc805a93885b8494974a7f1f579a5a62340148379be1d5

See more details on using hashes here.

File details

Details for the file pd_helper-1.0.0-py2.py3-none-any.whl.

File metadata

  • Download URL: pd_helper-1.0.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for pd_helper-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ffd42252fa4c1f2c69d43567bba2e3526910d01882779c34c674302c1f3ce657
MD5 59e12406d6a08ea6f7b73a3f088e8640
BLAKE2b-256 9e23e71854e166a8a70f9918ab5fb9a7eecee5b1e954bc6d6165f8bbb7ff7c07

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page