Reduce pandas' dataframe memory usage
Project description
Dataframe Memory Project
This tools aims to provide simple solution to save memory when using pandas' data frame. It is highly inspired by this kaggle post.
[!IMPORTANT] The very basic principle : for each column, this tool reduces int and float precision as much as possible so that
- Approximate method
method='approx'
: no duplicated values appear and the minimum and maximum can be re-encoded- Exact method
method='exact'
: preserve absolute information by testing every value.For object data type, the function is trying to create category.
Usage
from data_memory import reduce_memory
import numpy as np
import pandas as pd
df = pd.DataFrame(
np.array(
[[1, 2, "aaa"],
[4, 5, "bbb"],
[7, 8, "ccc"]] * 10000),
columns=['a', 'b', 'c'])
reduce_memory(df, method="exact", verbose=True)
Yields the following decrease of memory
Memory usage input: 5.04 MB
Memory usage output: 0.09 MB
Decreased by: 98.28 %
df.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 30000 non-null category
1 b 30000 non-null category
2 c 30000 non-null category
[!WARNING] In
method='approx'
,
- This tool destroys information and should not be applied automatically to any dataframe but big ones
- It preserves relative but not absolute information
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dataframe_memory-2023.3.1.tar.gz
.
File metadata
- Download URL: dataframe_memory-2023.3.1.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0e5c25fb4fc4259447472d0f3d12661ec4004a0ed55d47ed6939aab3bcc3ac0 |
|
MD5 | f2992d918e4d594ff48f24c324efa5c1 |
|
BLAKE2b-256 | ae5075f3dc4c64d9ddacfaa49dde202774e5e400d31d4e43e254ee361be752c3 |
File details
Details for the file dataframe_memory-2023.3.1-py3-none-any.whl
.
File metadata
- Download URL: dataframe_memory-2023.3.1-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4c95977fc91b4fe7b73ee2fd2d8cf807e441dcad3ea949843f5bc796b2c34ce |
|
MD5 | 6644b4d034f9ed2b7412b679398a5a8e |
|
BLAKE2b-256 | db382be0715587f232ee9b9806e6bbf074c99078393699fe59354ed60ff558a0 |