Skip to main content

Pack Pandas data frames into smaller, more memory-efficient data types.

Project description

owid-repack-py

Build status PyPI version

Pack Pandas DataFrames into smaller, more memory efficient types.

Overview

When you load data into Pandas, it will use standard types by default:

  • object for strings
  • int64 for integers
  • float64 for floating point numbers

However, for many datasets there is a much more compact representation that Pandas could be using for that data. Using a more compact representation leads to lower memory usage, and smaller binary files on disk when using formats such as Feather and Parquet.

This library does just one thing: it shrinks your data frames to use smaller types.

Installing

pip install owid-repack

Usage

The owid.repack module exposes two methods, repack_series() and repack_frame().

repack_series() will detect the smallest type that can accurately fit the existing data in the series.

In [1]: from owid import repack

In [2]: pd.Series([1, 2, 3])
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: repack.repack_series(pd.Series([1.5, 2, 3]))
Out[3]:
0    1.5
1    2.0
2    3.0
dtype: float32

In [4]: repack.repack_series(pd.Series([1, None, 3]))
Out[4]:
0       1
1    <NA>
2       3
dtype: UInt8

In [5]: repack.repack_series(pd.Series([-1, None, 3]))
Out[5]:
0      -1
1    <NA>
2       3
dtype: Int8

The repack_frame() method simply does this across every column in your DataFrame, returning a new DataFrame.

Releases

  • 0.1.3:
    • Improve performance on float dtypes
  • 0.1.2:
    • Shrink columns with all NaNs to Int8
  • 0.1.1:
    • Fix Python support in package metadata to support 3.8.1 onwards
  • 0.1.0:
    • Migrate first version from owid-catalog-py repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

owid_repack-0.1.3.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

owid_repack-0.1.3-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file owid_repack-0.1.3.tar.gz.

File metadata

  • Download URL: owid_repack-0.1.3.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.2 Darwin/21.6.0

File hashes

Hashes for owid_repack-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b65075af87c63945795801a2d7fd744f3d9a47ce7faa20736f389051655bff4a
MD5 5ce6558c376d28d5415ad6842163dd84
BLAKE2b-256 505f6b750fd47f0ac9074fb146c4a9dec6efc483eb73d6b26b9245f25059737e

See more details on using hashes here.

File details

Details for the file owid_repack-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: owid_repack-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.2 Darwin/21.6.0

File hashes

Hashes for owid_repack-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c1a5e58964e4d83db6377f286cfc1c766aa60ae58d9a37003bdcda0194957b26
MD5 ce2d0068c4e3663bfa8ae7ef21ee68c2
BLAKE2b-256 dea98388e07e9a6e4da1188e2df3f85849808a44d8fd6765510c9d8a3a1d54a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page