Skip to main content

Tools to evaluate pandas performance when saving dataframes in different file formats.

Project description

Pandas Save Profiler

pandas_save_profiler helps you evaluating and comparing the performance of different pandas read and write methods.

Install

pip install pandas-save-profiler

Usage

Load pandas and a dataframe you want to save.

import pandas as pd
data = pd.util.testing.makeMissingDataframe()

Load pandas_save_profiler and use it to evaluate pandas performance saving a pickle file:

import pandas_save_profiler
data.save_profiler('to_pickle')

The output is a pandas series:

format                                                 pickle
writer                                              to_pickle
reader                                            read_pickle
writer_args                      {'path': '/tmp/tmppk7nkivk'}
reader_args        {'filepath_or_buffer': '/tmp/tmppk7nkivk'}
writer_time                                         0.0798338
reader_time                                         0.0294895
writer_memory                                     1.09087e+08
reader_memory                                     1.09118e+08
df_memory                                                 288
file_size                                                1122
writer_memory_h                                      109.1 MB
reader_memory_h                                      109.1 MB
df_memory_h                                         288 Bytes
file_size_h                                            1.1 kB
repeats                                                     5
reads_the_same                                           True
dtype: object

Values in the series indicate:

  • The format used to persist the dataframe and the writing and reading options.
  • Writing and reading times in seconds.
  • Writing and reading memory increment.
  • Size of the dataframe in memory.
  • Size of the saved file.

Memory values are in bytes but a "humanized" version is also reported. The saving and reloading process is repeated 5 times and average values are returned. The flag reads_the_same indicates whether the reloaded file is exactly the same as the original one or has some differences.

To compare several writing options you can use the save_profiler function on each of them and combine the results into a results dataframe:

pd.DataFrame([
    data.save_profiler('to_csv'),
    data.save_profiler('to_pickle'),
    data.save_profiler('to_parquet'),
])

returns:

    format      writer        reader                          writer_args  \
0      csv      to_csv      read_csv  {'path_or_buf': '/tmp/tmpsedehjob'}   
1   pickle   to_pickle   read_pickle         {'path': '/tmp/tmp_vhue2q7'}   
2  parquet  to_parquet  read_parquet         {'path': '/tmp/tmp0zn8qsnk'}   

                                  reader_args  writer_time  reader_time  \
0  {'filepath_or_buffer': '/tmp/tmpsedehjob'}     0.031842     0.039830   
1  {'filepath_or_buffer': '/tmp/tmp_vhue2q7'}     0.025705     0.028469   
2                {'path': '/tmp/tmp0zn8qsnk'}     0.039009     0.052447   

   writer_memory  reader_memory  df_memory  file_size writer_memory_h  \
0    110149632.0    110599372.8        288        139        110.1 MB   
1    110813184.0    110813184.0        288       1122        110.8 MB   
2    116892467.2    118014771.2        288       3449        116.9 MB   

  reader_memory_h df_memory_h file_size_h  repeats  reads_the_same  
0        110.6 MB   288 Bytes   139 Bytes        5           False  
1        110.8 MB   288 Bytes      1.1 kB        5            True  
2        118.0 MB   288 Bytes      3.4 kB        5            True  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_save_profiler-0.0.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandas_save_profiler-0.0.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file pandas_save_profiler-0.0.0.tar.gz.

File metadata

  • Download URL: pandas_save_profiler-0.0.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.0

File hashes

Hashes for pandas_save_profiler-0.0.0.tar.gz
Algorithm Hash digest
SHA256 dd11dc5cd2ca4245e675d05adccd4f5a44518dbada0efd974c81540e1bfa5ca5
MD5 4da50201ea4ee42aab633f6c8d5bc993
BLAKE2b-256 4a8cfb310a64edad0627cd37e9af9cc7d090d0242aef4c2f73c2a8e69d7bb610

See more details on using hashes here.

File details

Details for the file pandas_save_profiler-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: pandas_save_profiler-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.0

File hashes

Hashes for pandas_save_profiler-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 382d58eaa03f48dabf1777cfcdcb916a3dc1d4ebe0bcdf35eff200c8d3e27aa1
MD5 be9f6d8ec940ea11163837534d2a5374
BLAKE2b-256 5aa902661d379ec8c3157d59b55da6db5456b7f01551bfa0c67730131ea8a493

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page