Skip to main content

Pack Pandas data frames into smaller, more memory-efficient data types.

Project description

owid-repack-py

version

Pack Pandas DataFrames into smaller, more memory efficient types.

Overview

When you load data into Pandas, it will use standard types by default:

  • object for strings
  • int64 for integers
  • float64 for floating point numbers

However, for many datasets there is a much more compact representation that Pandas could be using for that data. Using a more compact representation leads to lower memory usage, and smaller binary files on disk when using formats such as Feather and Parquet.

This library does just one thing: it shrinks your data frames to use smaller types.

Installing

pip install owid-repack

Usage

The owid.repack module exposes two methods, repack_series() and repack_frame().

repack_series() will detect the smallest type that can accurately fit the existing data in the series.

In [1]: from owid import repack

In [2]: pd.Series([1, 2, 3])
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: repack.repack_series(pd.Series([1.5, 2, 3]))
Out[3]:
0    1.5
1    2.0
2    3.0
dtype: float32

In [4]: repack.repack_series(pd.Series([1, None, 3]))
Out[4]:
0       1
1    <NA>
2       3
dtype: UInt8

In [5]: repack.repack_series(pd.Series([-1, None, 3]))
Out[5]:
0      -1
1    <NA>
2       3
dtype: Int8

The repack_frame() method simply does this across every column in your DataFrame, returning a new DataFrame.

Releases

  • 0.1.0:
    • Migrate first version from owid-catalog-py repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

owid_repack-0.1.0.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

owid_repack-0.1.0-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file owid_repack-0.1.0.tar.gz.

File metadata

  • Download URL: owid_repack-0.1.0.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.7 Darwin/22.2.0

File hashes

Hashes for owid_repack-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e23c66345223e0ab04b62f2e18909a2d0297f83714bb2331fc8d07f11b9633d4
MD5 48901fd1c9f53446ea7065e83d6c74cf
BLAKE2b-256 0c4eed411567fbf10db1df86b0c432fee164968a9c8ea8a8663251a614885862

See more details on using hashes here.

File details

Details for the file owid_repack-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: owid_repack-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.10.7 Darwin/22.2.0

File hashes

Hashes for owid_repack-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3474f551f6692422c0fab4a595980f132a17643659b335da222e520cc9540ee2
MD5 f9fe105442fb6b1786b689c37c3dd66d
BLAKE2b-256 3e7c0508d1edb61246eedea7e88100f4cfcc4ce41ab49b6c2c99605b69ffd18c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page