A helpful script to optimize a Pandas DataFrame.
Project description
pd-helper
A helpful package to streamline Pandas DataFrame optimization.
Save 50-75% on DataFrame memory usage by running the optimizer.
Autoconfigure dtypes for appropriate data types in each column with helper.
Generate a random DataFrame of controlled random variables for testing with maker.
Install
pip install pd-helper
Basic Usage to Iterate over DataFrame
from pd_helper.maker import MakeData
from pd_helper.helper import optimize
faker = MakeData()
if __name__ == "__main__":
# MakeData() generates a fake dataframe, convenient for testing
df = faker.make_df()
df = optimize(df)
Better Usage With Multiprocessing
from pd_helper.maker import MakeData
from pd_helper.helper import optimize
faker = MakeData()
if __name__ == "__main__":
# MakeData() generates a fake dataframe, convenient for testing
df = faker.make_df()
df = optimize(df, enable_mp=True)
Specify Special Mappings
from pd_helper.maker import MakeData
from pd_helper.helper import optimize
faker = MakeData()
if __name__ == "__main__":
# MakeData() generates a fake dataframe, convenient for testing
df = faker.make_df()
special_mappings = {'string': ['object_id'],
'category': ['item_name']}
# special mappings will be applied instead of by optimize ruleset, they will be returned.
df = optimize(df
, enable_mp=True,
special_mappings=special_mappings
)
Sample Results with Helper
Starting with 175.63 MB memory.
After optmization.
Ending with 65.33 MB memory.
Generating a Randomly Imperfect DataFrame with Maker
Maker provides a class, MakeData(), to generate a table of made-up records.
Each row is an event where an item was retrieved.
Options to make the table imperfectly random in various ways.
Sample table below:
Retrieved Date | Item Name | Retrieved | Condition | Sector | |
---|---|---|---|---|---|
Example | 2019-01-01, 2019-03-4 | Toaster, Lighter | True, False | Junk, Excellent | 1, 2 |
Data Type | String | String | String | String | Integer |
References
-
Pandas Categorical: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html
-
Pandas Pickle: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html
-
Pandas CSV: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
-
Pandas Datetime: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
TODO
-
Improve efficiency of iterating on DataFrame.
-
Allow user to toggle logging.
-
Provide tools for imputing missing data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pd_helper-1.0.0.tar.gz
.
File metadata
- Download URL: pd_helper-1.0.0.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5c0e29c24beea2c1fa0b753ad2c7553dab1e167694a003c3335ec83affb572f |
|
MD5 | 94d0e1ee5ebbcec038bfd5adfc91ec97 |
|
BLAKE2b-256 | 9190e3db69d9c398cecc805a93885b8494974a7f1f579a5a62340148379be1d5 |
File details
Details for the file pd_helper-1.0.0-py2.py3-none-any.whl
.
File metadata
- Download URL: pd_helper-1.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffd42252fa4c1f2c69d43567bba2e3526910d01882779c34c674302c1f3ce657 |
|
MD5 | 59e12406d6a08ea6f7b73a3f088e8640 |
|
BLAKE2b-256 | 9e23e71854e166a8a70f9918ab5fb9a7eecee5b1e954bc6d6165f8bbb7ff7c07 |