Skip to main content

A robost data type optimizer for pandas dataframe

Project description

DataTypes Optimizer

A simple Python library to optimize the data types of a Pandas DataFrame, reducing memory usage.

How it works

The optimize_dtypes function in this library helps you reduce the memory footprint of your Pandas DataFrames. It works by downcasting numeric columns (both integers and floats) to their smallest possible data type that can still hold the data without any loss of precision.

For example, if you have a column of integers where the maximum value is 100, it's likely stored as an int64 by default in Pandas. This function will intelligently convert it to int8, which uses significantly less memory.

The library currently optimizes:

  • Integer columns
  • Float columns

It skips the following data types as they generally do not benefit from this type of downcasting:

  • Object (string)
  • Boolean
  • Categorical

Usage

Here is a simple example of how to use the optimize_dtypes function:

import pandas as pd
import numpy as np
from datatypesoptimizer.dataOptimizer import optimize_dtypes

# Create a sample DataFrame
data = {
    'integers': [1, 2, 100, 200],
    'floats': [1.0, 2.5, 3.5, 4.5],
    'strings': ['a', 'b', 'c', 'd']
}
df = pd.DataFrame(data)

print("Original DataFrame memory usage:")
print(df.memory_usage(deep=True))
print("\nOriginal dtypes:")
print(df.dtypes)

# Optimize the DataFrame
optimized_df = optimize_dtypes(df)

print("\nOptimized DataFrame memory usage:")
print(optimized_df.memory_usage(deep=True))
print("\nOptimized dtypes:")
print(optimized_df.dtypes)

Example Output

Original DataFrame memory usage:
Index       132
integers     32
floats       32
strings     244
dtype: int64

Original dtypes:
integers     int64
floats      float64
strings      object
dtype: object

Optimized DataFrame memory usage:
Index       132
integers      4
floats        4
strings     244
dtype: int64

Optimized dtypes:
integers      int8
floats      float32
strings      object
dtype: object

As you can see from the output, the memory usage for the integers and floats columns has been significantly reduced after optimization.

Installation

To use this library, you can clone the repository and import the optimize_dtypes function from the datatypesoptimizer.dataOptimizer module.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datatypesoptimizer-0.1.2.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datatypesoptimizer-0.1.2-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file datatypesoptimizer-0.1.2.tar.gz.

File metadata

  • Download URL: datatypesoptimizer-0.1.2.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for datatypesoptimizer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 07b1649fd4417857439efd59a2c292cf3d5209df50e55b5f3be793b052827861
MD5 b13ddee9332df2ea7f2f1010de55dc5a
BLAKE2b-256 340dbf2f6aaf58d8070a6a00e553037c121663d02b063324649dda22a6734c46

See more details on using hashes here.

File details

Details for the file datatypesoptimizer-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for datatypesoptimizer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d27224ed710b804c38ab435134314964082d24773231e047e0096023ff544259
MD5 400bef79dcfee4719ec661b5f21c2a09
BLAKE2b-256 ae7d6cc8ecf5e4505b2b5ff016e379c4cd50a5bfa6c52b698e08ab1e490afced

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page