Skip to main content

a utility for space efficient dataframes

Project description

pandas hug

sometimes you need to embrace your data and get a little, or a lot, more of it into memory.

your column data types are rarely space efficient. most of the time this is because they were chosen by someone else, but sometimes its just a hassle to find the most space efficient types.

pandas-hug is here to help crush your data to fit in memory.

installation

pip install pandas-hug

usage

import pandas as pd
import pandas_hug

S = pd.Series([2**8])
A = pd.Series([f'a{i}' for i in range(100)])
M = pd.Series([42])
E = pd.Series(['a', 'b', 'c'] * 15)
df = pd.DataFrame({'S': S, 'A': A, 'M': M, 'E': E})

df.info()

   <class 'pandas.core.frame.DataFrame'>
   RangeIndex: 100 entries, 0 to 99
   Data columns (total 4 columns):
    #   Column  Non-Null Count  Dtype
   ---  ------  --------------  -----
    0   S       1 non-null      float64
    1   A       100 non-null    object
    2   M       1 non-null      float64
    3   E       45 non-null     object
   dtypes: float64(2), object(2)
   memory usage: 3.2+ KB


df.convert_dtypes().hug().info()

   <class 'pandas.core.frame.DataFrame'>
   RangeIndex: 100 entries, 0 to 99
   Data columns (total 4 columns):
    #   Column  Non-Null Count  Dtype
   ---  ------  --------------  -----
    0   S       1 non-null      UInt16
    1   A       100 non-null    string
    2   M       1 non-null      UInt8
    3   E       45 non-null     category
   dtypes: UInt16(1), UInt8(1), category(1), string(1)
   memory usage: 1.6 KB

pandas-hug monkey-patches pandas.DataFrame and pandas.Series to add the hug() method.

you should call convert_dtypes() before hugging your data. this does useful things like converting float to int (pandas >=1.2.0, dec 2020) and object to string where appropriate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pandas_hug-0.14.0-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file pandas_hug-0.14.0-py3-none-any.whl.

File metadata

  • Download URL: pandas_hug-0.14.0-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for pandas_hug-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8bbb26072e34362a45e45c09c2283829e389c1436bc211907d8ed4158ca30703
MD5 016642c9039b1e0c6cfbfd07374f65b4
BLAKE2b-256 750bd7343affcda64bca4385c89a06946f1e57ab4e3ae3734d9e6816c33200f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page