Tools to evaluate pandas performance when saving dataframes in different file formats.
Project description
Pandas Save Profiler
pandas_save_profiler
helps you evaluating and comparing the performance of different pandas read and write methods.
Install
pip install pandas-save-profiler
Usage
Load pandas and a dataframe you want to save.
import pandas as pd
data = pd.util.testing.makeMissingDataframe()
Load pandas_save_profiler
and use it to evaluate pandas performance saving a pickle file:
import pandas_save_profiler
data.save_profiler('to_pickle')
The output is a pandas series:
format pickle
writer to_pickle
reader read_pickle
writer_args {'path': '/tmp/tmppk7nkivk'}
reader_args {'filepath_or_buffer': '/tmp/tmppk7nkivk'}
writer_time 0.0798338
reader_time 0.0294895
writer_memory 1.09087e+08
reader_memory 1.09118e+08
df_memory 288
file_size 1122
writer_memory_h 109.1 MB
reader_memory_h 109.1 MB
df_memory_h 288 Bytes
file_size_h 1.1 kB
repeats 5
reads_the_same True
dtype: object
Values in the series indicate:
- The format used to persist the dataframe and the writing and reading options.
- Writing and reading times in seconds.
- Writing and reading memory increment.
- Size of the dataframe in memory.
- Size of the saved file.
Memory values are in bytes but a "humanized" version is also reported.
The saving and reloading process is repeated 5 times and average values are returned.
The flag reads_the_same
indicates whether the reloaded file is exactly the same as the original one or has some differences.
To compare several writing options you can use the save_profiler
function on each of them
and combine the results into a results dataframe:
pd.DataFrame([
data.save_profiler('to_csv'),
data.save_profiler('to_pickle'),
data.save_profiler('to_parquet'),
])
returns:
format writer reader writer_args \
0 csv to_csv read_csv {'path_or_buf': '/tmp/tmpsedehjob'}
1 pickle to_pickle read_pickle {'path': '/tmp/tmp_vhue2q7'}
2 parquet to_parquet read_parquet {'path': '/tmp/tmp0zn8qsnk'}
reader_args writer_time reader_time \
0 {'filepath_or_buffer': '/tmp/tmpsedehjob'} 0.031842 0.039830
1 {'filepath_or_buffer': '/tmp/tmp_vhue2q7'} 0.025705 0.028469
2 {'path': '/tmp/tmp0zn8qsnk'} 0.039009 0.052447
writer_memory reader_memory df_memory file_size writer_memory_h \
0 110149632.0 110599372.8 288 139 110.1 MB
1 110813184.0 110813184.0 288 1122 110.8 MB
2 116892467.2 118014771.2 288 3449 116.9 MB
reader_memory_h df_memory_h file_size_h repeats reads_the_same
0 110.6 MB 288 Bytes 139 Bytes 5 False
1 110.8 MB 288 Bytes 1.1 kB 5 True
2 118.0 MB 288 Bytes 3.4 kB 5 True
Project details
None None None NoneRelease history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pandas_save_profiler-0.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd11dc5cd2ca4245e675d05adccd4f5a44518dbada0efd974c81540e1bfa5ca5 |
|
MD5 | 4da50201ea4ee42aab633f6c8d5bc993 |
|
BLAKE2b-256 | 4a8cfb310a64edad0627cd37e9af9cc7d090d0242aef4c2f73c2a8e69d7bb610 |
Hashes for pandas_save_profiler-0.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 382d58eaa03f48dabf1777cfcdcb916a3dc1d4ebe0bcdf35eff200c8d3e27aa1 |
|
MD5 | be9f6d8ec940ea11163837534d2a5374 |
|
BLAKE2b-256 | 5aa902661d379ec8c3157d59b55da6db5456b7f01551bfa0c67730131ea8a493 |