Skip to main content

Compresses files in memory and replaces the original by a .gz file when there is no space on drive.

Project description

cmemgzip

cmemgzip v.0.4.4

A Python 3 Open Source utility created by Carles Mateo.

http://blog.carlesmateo.com/cmemgzip

cmemgzip is created for those ocasions when we are in a Server, and has the drive/s full, there is no disk space, and we don't want to delete the core files/dumps or logs. What cmemgzip does is to read the file in binary mode, to keep completely in memory, to compress it from memory, then ensure it has write permissions on the folder (by creating and empty file), and delete the original file, and write from memory the compressed file. It can also load the file per blocks, and compress those blocks, at the cost of a bit of loss of compression efficiency. For that parameters -m=XXM or -m=YYG is used. Refer to the PDF manual for more details.

This file can be later decompressed by gzip/gunzip or reviewed with zcat.

The default mode: Allocate all the file in memory

In order to be able to do its job, your server or instance, must have enough free memory to allocate all the file in memory, and its version compressed.

For example, in order to cmemgzip a 2.7GB core dump file, you will need:

2.7GB from the original file + 270MB from the compressed file, aprox 3.1GB of RAM Memory free.

The Block mode: Use a chunk size

In the compression by blocks you specify how many Megabytes or Gigabytes will be used to read the Block from the file. Then that block will be compressed in memory and a new block will be loaded.

For example, in order to compress a log file of 2 GB in size, by using an small amount of memory you can run:

cmemgzip -m=100M myfile.log

This will load the file in blocks of 100MB and compress them into memory. For a 2GB log file that result in 200 MB once compressed, using blocks of 100MB, the memory requirements for cmemgzip would be around 300 MB. However, you can specify to use a block size of 10 MB, and then memory required will be only around 220 MB. It depends on how much it is compressed. By general rule for logs, the biggest the block size is, the better savings in disk space you'll get.

Compressing multiple files

Just provide a mask with * instead of a file name.

For instace:

cmemgzip /var/log/*

Risks

With great power comes great responsibility. As every tool that works with files, this tool must be used very carefully. If you have many processes competing to write to the drive, they may fill the space recovered when deleting the original file fast, and make impossible to write its compressed version. On this version 0.2, in that (extreme) situation, it asks for another destination to store the compressed file. This should not happen unless that server was under extreme load. If you compress logs or core dumps, the compression ratio is so high, that is really difficult that this mnay happen. As the space gain is massive. (From 2.7GB uncompressed core dump file, to 268MB when compressed). Use it wisely at your own risk.

Files avoided

cmemgzip will check that files compressed are at least 100 bytes in size, and will cancel the process if the compressed version is bigger than the original file (typically if you attempt to compress an already compressed file).

It will aso avoid deleting the original file if the compressed version is equal or bigger, in size.

It will also skip files which name ends in .gz .gzip .zip .bzip .bzip2 .rar .xz

Installation

Install from PIP for Python 3:

pip3 install cmemgzip

Here is the page for the PIP package: https://pypi.org/project/cmemgzip/

if you don't have pip in your system you can install it in Ubuntu Servers with:

apt install python3-pip

Cloning from the repository:

git clone https://gitlab.com/carles.mateo/cmemgzip.git

Release notes:

This version v. 0.4.1 has been tested with Ubuntu and Windows 10 64 bit. Previous version v. 0.4 has been tested with Ubuntu, Windows 10 Professional, Mac Os X and Ubuntu 20.04 LTS in Raspberry Pi 4.

Version 0.4.1 autodetects Windows and disables colors.

Be careful not to use on programs that keep a fd (File Descriptor) open to the log file, as deleting the original log file will not return the space to the Filesystem. That was tipically the case of some webservers. You should stop the webserver first, or deal with the fds.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cmemgzip-0.4.4.tar.gz (16.2 kB view hashes)

Uploaded Source

Built Distribution

cmemgzip-0.4.4-py3-none-any.whl (14.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page