Skip to main content

Pack Pandas data frames into smaller, more memory-efficient data types.

Project description

owid-repack-py

version version

Pack Pandas DataFrames into smaller, more memory efficient types.

Overview

When you load data into Pandas, it will use standard types by default:

  • object for strings
  • int64 for integers
  • float64 for floating point numbers

However, for many datasets there is a much more compact representation that Pandas could be using for that data. Using a more compact representation leads to lower memory usage, and smaller binary files on disk when using formats such as Feather and Parquet.

This library does just one thing: it shrinks your data frames to use smaller types.

Installing

pip install owid-repack

Usage

The owid.repack module exposes two methods, repack_series() and repack_frame().

repack_series() will detect the smallest type that can accurately fit the existing data in the series.

In [1]: from owid import repack

In [2]: pd.Series([1, 2, 3])
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: repack.repack_series(pd.Series([1.5, 2, 3]))
Out[3]:
0    1.5
1    2.0
2    3.0
dtype: float32

In [4]: repack.repack_series(pd.Series([1, None, 3]))
Out[4]:
0       1
1    <NA>
2       3
dtype: UInt8

In [5]: repack.repack_series(pd.Series([-1, None, 3]))
Out[5]:
0      -1
1    <NA>
2       3
dtype: Int8

The repack_frame() method simply does this across every column in your DataFrame, returning a new DataFrame.

Releases

  • 0.1.1:
    • Fix Python support in package metadata to support 3.8.1 onwards
  • 0.1.0:
    • Migrate first version from owid-catalog-py repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

owid_repack-0.1.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

owid_repack-0.1.1-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file owid_repack-0.1.1.tar.gz.

File metadata

  • Download URL: owid_repack-0.1.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.7 Darwin/22.2.0

File hashes

Hashes for owid_repack-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a062f2f775f92a50bc0d974c035846a3764778a3c6422642731dd7a005db646b
MD5 31a192e5d1c76516a6973657f68f1974
BLAKE2b-256 9d739a1d457830de6e7b97b128be09022bd7a37ff82cbc3979889f5186ef8e02

See more details on using hashes here.

File details

Details for the file owid_repack-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: owid_repack-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.7 Darwin/22.2.0

File hashes

Hashes for owid_repack-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 837d8b1b029ec6d065c75c171598bfd363c8fdff2a01ccbe8b5c0fec91b831af
MD5 732efbe6b4cb49990509ff9c57ce9075
BLAKE2b-256 b62ef511ac92ae49c9534388041188b1027eecf8debb1378865589afa57cbbcc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page