Skip to main content

A python package to compress pandas DataFrames akin to Stata's `compress` command

Project description

Build Python PyPI DOI

df-compress

A python package to compress pandas DataFrames akin to Stata's compress command. This function may prove particularly helpfull to those dealing with large datasets.

Installation

You can install df-compress by running the following command:

pip install df_compress

How to use

After installing the package use the following import:

from df_compress import compress

Example

It follows a reproducible example on df-compress usage:

from df_compress import compress
import pandas as pd
import numpy as np

df = pd.DataFrame(columns=["Year","State","Value","Int_value"])
df.Year = np.random.randint(low=2000,high=2023,size=200).astype(str)
df.State = np.random.choice(['RJ','SP','ES','MT'],size=200)
df.Value= np.random.rand(200,1)
df.Int_value = df.Value*10 // 1

compress(df, show_conversions=True) # which modifies the original DataFrame without needing to reassign it

Which will print for you the transformations and memory saved:

Initial memory usage: 0.02 MB
Final memory usage: 0.00 MB
Memory reduced by: 0.02 MB (91.3%)

Variable type conversions:
   column    from       to  memory saved (MB)
     Year  object    int16           0.009727
    State  object category           0.009178
    Value float64  float32           0.000763
Int_value float64     int8           0.001335

Optional Parameters

The function has three optimal parameters (arguments):

  • convert_strings (bool): Whether to attempt to parse object columns as numbers
    • defaults to True
  • numeric_threshold (float): Indicates the proportion of valid numeric entries needed to convert a string to numeric
    • defaults to 0.999
  • show_conversions (bool): whether to report the changes made column by column
    • defaults to False

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

df_compress-0.6.2.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

df_compress-0.6.2-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file df_compress-0.6.2.tar.gz.

File metadata

  • Download URL: df_compress-0.6.2.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for df_compress-0.6.2.tar.gz
Algorithm Hash digest
SHA256 31b5e6ad77db5dc7a8c2025c6be0ad5447035b9083ae909b7d4eaed3aba18f72
MD5 75bc3f933a0a0f2282a12a32a6357597
BLAKE2b-256 3b89477c6f7eefb6667460a5c5a90c8ff9769c1ca893d34d91f525e83262a5ab

See more details on using hashes here.

File details

Details for the file df_compress-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: df_compress-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 4.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for df_compress-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 aeef7c169a9b1918fa503fcc14542a0bfa518df6705d2cf7922e6a56190dd875
MD5 534dd425d50e4e327c196613b849031c
BLAKE2b-256 62a5f6fd7511ed8f48990c8aaf9e0a32ace5f025b42fb779e091aa217f7ef284

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page