Skip to main content

A python package to compress pandas DataFrames akin to Stata's `compress` command

Project description

Build Python PyPI DOI

df-compress

A python package to compress pandas DataFrames akin to Stata's compress command. This function may proove particularly helpfull if you are dealing with large datasets.

How to use

After installing the package use the following import:

from df_compress import compress

Example

It follows a reproducible example on df-compress usage:

from df_compress import compress
import pandas as pd
import numpy as np

df = pd.DataFrame(columns=["Year","State","Value"])
df.Year = np.random.randint(low=2000,high=2023,size=200).astype(str)
df.State = np.random.choice(['RJ','SP','ES','MT'],size=200)
df.Value= np.random.rand(200,1)

df = compress(df, show_conversions=True)

Which will print for you the transformations and memory saved:

Initial memory usage: 0.02 MB
Final memory usage: 0.00 MB
Memory reduced by: 0.02 MB (91.5%)

Variable type conversions:
column    from       to  memory saved (MB)
  Year  object    int16           0.009727
 State  object category           0.009178
 Value float64  float32           0.000763

Optional Parameters

The function has three optimal parameters (arguments):

  • convert_strings (bool): Whether to attempt to parse object columns as numbers
    • defaults to True
  • numeric_threshold (float): Indicates the proportion of valid numeric entries needed to convert a string to numeric
    • defaults to 0.999
  • show_conversions (bool): whether to report the changes made column by column
    • defaults to False

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

df_compress-0.4.2.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

df_compress-0.4.2-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file df_compress-0.4.2.tar.gz.

File metadata

  • Download URL: df_compress-0.4.2.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for df_compress-0.4.2.tar.gz
Algorithm Hash digest
SHA256 436ddb5390fd4adba765d5a1b2d0e3de02504a988cca4e5b54fbc681ef18a2d9
MD5 797bf83972ad20cc657cdba265ab453f
BLAKE2b-256 a6105e0ec65c15b181734bba9975e62b7cd2b655f6e55e09aeb65332e0ea1908

See more details on using hashes here.

File details

Details for the file df_compress-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: df_compress-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for df_compress-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 71f2b0bba6b78fe1e2ccd36ec5bd189a14c4fb8a0e8bd6b3b26ca237d9971fd2
MD5 1047ec97a0bc295e41dcfa724da9ee31
BLAKE2b-256 65a37d747846aba5ed5f834c1a0af972205f60c0fa5d528f4dd6ef8d5976c4d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page