A python package to compress pandas DataFrames akin to Stata's `compress` command
Project description
df-compress
A python package to compress pandas DataFrames akin to Stata's compress command. This function may prove particularly helpfull to those dealing with large datasets.
Installation
You can install df-compress by running the following command:
pip install df_compress
How to use
After installing the package use the following import:
from df_compress import compress
Example
It follows a reproducible example on df-compress usage:
from df_compress import compress
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=["Year","State","Value","Int_value"])
df.Year = np.random.randint(low=2000,high=2023,size=200).astype(str)
df.State = np.random.choice(['RJ','SP','ES','MT'],size=200)
df.Value= np.random.rand(200,1)
df.Int_value = df.Value*10 // 1
compress(df, show_conversions=True) # which modifies the original DataFrame without needing to reassign it
Which will print for you the transformations and memory saved:
Initial memory usage: 0.02 MB
Final memory usage: 0.00 MB
Memory reduced by: 0.02 MB (91.5%)
Variable type conversions:
column from to memory saved (MB)
Year object int16 0.009727
State object category 0.009178
Value float64 float32 0.000763
Optional Parameters
The function has three optimal parameters (arguments):
convert_strings(bool): Whether to attempt to parse object columns as numbers- defaults to
True
- defaults to
numeric_threshold(float): Indicates the proportion of valid numeric entries needed to convert a string to numeric- defaults to
0.999
- defaults to
show_conversions(bool): whether to report the changes made column by column- defaults to
False
- defaults to
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file df_compress-0.6.0.tar.gz.
File metadata
- Download URL: df_compress-0.6.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d1372569079ade01715557bf6e20b53563bd6833d52f07e41ae582a59ca2570
|
|
| MD5 |
8074e4ad4607fe2dbc69929049b4ab00
|
|
| BLAKE2b-256 |
ae66e45452ab1061654ad06af833c744a8aa70ecb6360f33b46b888a89769970
|
File details
Details for the file df_compress-0.6.0-py3-none-any.whl.
File metadata
- Download URL: df_compress-0.6.0-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae78bfdb99ec7728b9c2650194c951acd01ab7930d0c80d2e5e148349fb3c7d7
|
|
| MD5 |
c43e4ba87a14f18a39553ccf65e0c394
|
|
| BLAKE2b-256 |
0399dfc8c6beeb83f7e446fb2a368faf1bc0b2d84167b1bb39336d22c643edc6
|