a utility for space efficient dataframes
Project description
pandas hug
sometimes you need to embrace your data and get a little, or a lot, more of it into memory.
your column data types are rarely space efficient. most of the time this is because they were chosen by someone else, but sometimes its just a hassle to find the most space efficient types.
pandas-hug
is here to help crush your data to fit in memory.
installation
pip install pandas-hug
usage
import pandas as pd
import pandas_hug
S = pd.Series([2**8])
A = pd.Series([f'a{i}' for i in range(100)])
M = pd.Series([42])
E = pd.Series(['a', 'b', 'c'] * 15)
df = pd.DataFrame({'S': S, 'A': A, 'M': M, 'E': E})
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 S 1 non-null float64
1 A 100 non-null object
2 M 1 non-null float64
3 E 45 non-null object
dtypes: float64(2), object(2)
memory usage: 3.2+ KB
df.convert_dtypes().hug().info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 S 1 non-null UInt16
1 A 100 non-null string
2 M 1 non-null UInt8
3 E 45 non-null category
dtypes: UInt16(1), UInt8(1), category(1), string(1)
memory usage: 1.6 KB
pandas-hug
monkey-patches pandas.DataFrame
and pandas.Series
to add the hug()
method.
you should call convert_dtypes()
before hugging your data. this does useful things like converting float
to int
(pandas >=1.2.0, dec 2020) and object
to string
where appropriate.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file pandas_hug-0.14.0-py3-none-any.whl
.
File metadata
- Download URL: pandas_hug-0.14.0-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bbb26072e34362a45e45c09c2283829e389c1436bc211907d8ed4158ca30703 |
|
MD5 | 016642c9039b1e0c6cfbfd07374f65b4 |
|
BLAKE2b-256 | 750bd7343affcda64bca4385c89a06946f1e57ab4e3ae3734d9e6816c33200f3 |