Skip to main content

A python package to compress pandas DataFrames akin to Stata's `compress` command

Project description

Build Python PyPI DOI

df-compress

A python package to compress pandas DataFrames akin to Stata's compress command. This function may prove particularly helpfull to those dealing with large datasets.

Installation

You can install df-compress by running the following command:

pip install df_compress

How to use

After installing the package use the following import:

from df_compress import compress

Example

It follows a reproducible example on df-compress usage:

from df_compress import compress
import pandas as pd
import numpy as np

df = pd.DataFrame(columns=["Year","State","Value","Int_value"])
df.Year = np.random.randint(low=2000,high=2023,size=200).astype(str)
df.State = np.random.choice(['RJ','SP','ES','MT'],size=200)
df.Value= np.random.rand(200,1)
df.Int_value = df.Value*10 // 1

compress(df, show_conversions=True) # which modifies the original DataFrame without needing to reassign it

Which will print for you the transformations and memory saved:

Initial memory usage: 0.02 MB
Final memory usage: 0.00 MB
Memory reduced by: 0.02 MB (91.3%)

Variable type conversions:
   column    from       to  memory saved (MB)
     Year  object    int16           0.009727
    State  object category           0.009178
    Value float64  float32           0.000763
Int_value float64     int8           0.001335

Optional Parameters

The function has three optimal parameters (arguments):

  • convert_strings (bool): Whether to attempt to parse object columns as numbers
    • defaults to True
  • numeric_threshold (float): Indicates the proportion of valid numeric entries needed to convert a string to numeric
    • defaults to 0.999
  • show_conversions (bool): whether to report the changes made column by column
    • defaults to False

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

df_compress-0.6.1.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

df_compress-0.6.1-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file df_compress-0.6.1.tar.gz.

File metadata

  • Download URL: df_compress-0.6.1.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for df_compress-0.6.1.tar.gz
Algorithm Hash digest
SHA256 62b033b389b54684c1b3c6a712d7c42484c3775c73c1bcfa223079bed2fb55b8
MD5 ed5c9e6237abc3decd3cc7278bd3e419
BLAKE2b-256 9d3bbb8292ca4b7b7756ed2cb53b0aeb156d32e54d786418f0ed3ff49d560d64

See more details on using hashes here.

File details

Details for the file df_compress-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: df_compress-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 4.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for df_compress-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 766b933cb6ed8dac66c3adbfc266eedd9b8f1aa83fb7e245727e3bcc8da84af5
MD5 82bd1872c21cfff010a6257b0b1c75a2
BLAKE2b-256 ec3246a5faf52e3f686e5fe546db47d0b2e416253a9b7860aecd852fc382a67d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page