Skip to main content

Tools to evaluate pandas performance when saving dataframes in different file formats.

None None None None

Project description

Pandas Save Profiler

pandas_save_profiler helps you evaluating and comparing the performance of different pandas read and write methods.

Install

pip install pandas-save-profiler

Usage

Load pandas and a dataframe you want to save.

import pandas as pd
data = pd.util.testing.makeMissingDataframe()

Load pandas_save_profiler and use it to evaluate pandas performance saving a pickle file:

import pandas_save_profiler
data.save_profiler('to_pickle')

The output is a pandas series:

format                                                 pickle
writer                                              to_pickle
reader                                            read_pickle
writer_args                      {'path': '/tmp/tmppk7nkivk'}
reader_args        {'filepath_or_buffer': '/tmp/tmppk7nkivk'}
writer_time                                         0.0798338
reader_time                                         0.0294895
writer_memory                                     1.09087e+08
reader_memory                                     1.09118e+08
df_memory                                                 288
file_size                                                1122
writer_memory_h                                      109.1 MB
reader_memory_h                                      109.1 MB
df_memory_h                                         288 Bytes
file_size_h                                            1.1 kB
repeats                                                     5
reads_the_same                                           True
dtype: object

Values in the series indicate:

  • The format used to persist the dataframe and the writing and reading options.
  • Writing and reading times in seconds.
  • Writing and reading memory increment.
  • Size of the dataframe in memory.
  • Size of the saved file.

Memory values are in bytes but a "humanized" version is also reported. The saving and reloading process is repeated 5 times and average values are returned. The flag reads_the_same indicates whether the reloaded file is exactly the same as the original one or has some differences.

To compare several writing options you can use the save_profiler function on each of them and combine the results into a results dataframe:

pd.DataFrame([
    data.save_profiler('to_csv'),
    data.save_profiler('to_pickle'),
    data.save_profiler('to_parquet'),
])

returns:

    format      writer        reader                          writer_args  \
0      csv      to_csv      read_csv  {'path_or_buf': '/tmp/tmpsedehjob'}   
1   pickle   to_pickle   read_pickle         {'path': '/tmp/tmp_vhue2q7'}   
2  parquet  to_parquet  read_parquet         {'path': '/tmp/tmp0zn8qsnk'}   

                                  reader_args  writer_time  reader_time  \
0  {'filepath_or_buffer': '/tmp/tmpsedehjob'}     0.031842     0.039830   
1  {'filepath_or_buffer': '/tmp/tmp_vhue2q7'}     0.025705     0.028469   
2                {'path': '/tmp/tmp0zn8qsnk'}     0.039009     0.052447   

   writer_memory  reader_memory  df_memory  file_size writer_memory_h  \
0    110149632.0    110599372.8        288        139        110.1 MB   
1    110813184.0    110813184.0        288       1122        110.8 MB   
2    116892467.2    118014771.2        288       3449        116.9 MB   

  reader_memory_h df_memory_h file_size_h  repeats  reads_the_same  
0        110.6 MB   288 Bytes   139 Bytes        5           False  
1        110.8 MB   288 Bytes      1.1 kB        5            True  
2        118.0 MB   288 Bytes      3.4 kB        5            True  

Project details

None None None None

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_save_profiler-0.0.0.tar.gz (6.1 kB view hashes)

Uploaded Source

Built Distribution

pandas_save_profiler-0.0.0-py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page