Skip to main content

a utility for space efficient dataframes

Project description

pandas hug

sometimes you need to embrace your data and get a little, or a lot, more of it into memory.

your column data types are rarely space efficient. most of the time this is because they were chosen by someone else, but sometimes its just a hassle to find the most space efficient types.

pandas-hug is here to help crush your data to fit in memory.

installation

pip install pandas-hug

usage

import pandas as pd
import pandas_hug

S = pd.Series([2**8])
A = pd.Series([f'a{i}' for i in range(100)])
M = pd.Series([42])
E = pd.Series(['a', 'b', 'c'] * 15)
df = pd.DataFrame({'S': S, 'A': A, 'M': M, 'E': E})

df.info()

   <class 'pandas.core.frame.DataFrame'>
   RangeIndex: 100 entries, 0 to 99
   Data columns (total 4 columns):
    #   Column  Non-Null Count  Dtype
   ---  ------  --------------  -----
    0   S       1 non-null      float64
    1   A       100 non-null    object
    2   M       1 non-null      float64
    3   E       45 non-null     object
   dtypes: float64(2), object(2)
   memory usage: 3.2+ KB


df.convert_dtypes().hug().info()

   <class 'pandas.core.frame.DataFrame'>
   RangeIndex: 100 entries, 0 to 99
   Data columns (total 4 columns):
    #   Column  Non-Null Count  Dtype
   ---  ------  --------------  -----
    0   S       1 non-null      UInt16
    1   A       100 non-null    string
    2   M       1 non-null      UInt8
    3   E       45 non-null     category
   dtypes: UInt16(1), UInt8(1), category(1), string(1)
   memory usage: 1.6 KB

pandas-hug monkey-patches pandas.DataFrame and pandas.Series to add the hug() method.

you should call convert_dtypes() before hugging your data. this does useful things like converting float to int (pandas >=1.2.0, dec 2020) and object to string where appropriate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pandas_hug-0.13.0-py3-none-any.whl (4.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page