A helpful script to optimize a Pandas DataFrame.
Project description
pd-helper
A helpful package to streamline Pandas DataFrame optimization.
Save 50-75% on DataFrame memory usage by running the optimizer.
Auto configure dtypes for appropriate data types in each column.
Basic Usage to Iterate over DataFrame
from pd_helper.helper import optimize
if __name__ == "__main__":
# some DataFrame, df
df = optimize(df)
Better Usage With Multiprocessing
from pd_helper.helper import optimize
if __name__ == "__main__":
# some DataFrame, df
df = optimize(df, enable_mp=True)
Install
pip install pd-helper
Sample Results
Starting with 175.63 MB memory.
After optmization.
Ending with 65.33 MB memory.
TODO
-
Improve efficiency of iterating on DataFrame.
-
Allow user to toggle logging.
-
Provide tools for imputing missing data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pd_helper-0.1.1.tar.gz
(5.9 kB
view hashes)
Built Distribution
Close
Hashes for pd_helper-0.1.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfdc3e60f27741069e43c25f27cca4917efc3933ccb367b5585946d6eb93fd63 |
|
MD5 | 7f8a7688e2a01e49147f8fe727d841ff |
|
BLAKE2b-256 | 3e7876498da5e3911e85cbb115d12871242f9380c8edd2eaf7cf38692fe70e11 |