A Python wrapper for the FAPEC data compressor.
Project description
FaPyc
A Python wrapper for the FAPEC data compressor. (C) DAPCOM Data Services S.L. - https://www.dapcom.es
The free decompression-only library is included, which has some limitations such as the maximum number of threads and the recovery of corrupted files. Only a 'dummy' compression library is provided: You can get free evaluation licenses at https://www.dapcom.es/get-fapec/ to test the compressor. For full licenses, please contact us at fapec@dapcom.es
Usage
There are 3 main execution modes:
- File: When invoking Fapyc or Unfapyc on a filename, it will (de)compress it directly into another file.
- Buffer: You can load the whole file to (de)compress on e.g. a byte array, and then invoke Fapyc/Unfapyc which will leave the result in the output buffer. Obviously, you should be careful with large/huge files!
- File-to-buffer decompression: You can directly decompress a file (without having to load it beforehand) and leave its decompressed output in a buffer, which you can use afterwards.
- Chunk: FAPEC internally works in 'chunks' of data, of up to 384MB each, which allows to progressively (de)compress a huge file while keeping memory usage under control. For now, this feature is only available in the FAPEC CLI and C API, not in Fapyc/Unfapyc yet.
Examples
Compress and decompress a file
In this example we use the kmall
option of FAPEC, suitable for this kind of geomaritime data files from Kongsberg Maritime:
from fapyc import Fapyc, Unfapyc
filename = input("Path to KMALL file: ")
print("Preparing to compress %s" % (filename))
# Here we invoke FAPEC to directly run on files,
# so the memory usage will be much smaller (just 5MB or so)
# although it won't allow us to directly access the
# (de)compressed buffers.
f = Fapyc(filename, chunksize = 2048576, blen = 512)
f.compress_kmall()
print("Preparing to decompress %s" % (filename + ".fapec"))
uf = Unfapyc(filename + ".fapec")
uf.decompress(output=filename+".dec")
Compress and decompress a buffer
In this example we use the tab
option of FAPEC, which typically outperforms gzip
and bzip2
on tabulated text data:
from fapyc import Fapyc, Unfapyc
filename = input("Path to file: ")
file = open(filename, "rb")
# Beware - Load the whole file to memory
data = file.read()
f = Fapyc(buffer = data)
# Invoke our tabulated-text compression algorithm
# indicating a comma separator
f.compress_tabtxt(sep1=',')
print("Ratio =", round(float(len(data))/len(f.outputBuffer), 4))
# Now we decompress the buffer
uf = Unfapyc(buffer = f.outputBuffer)
uf.decompress()
Decompress a file into a buffer, and do some operations on it
Here we provide a quite specific use case, based on the Gaia (E)DR3 bulk catalogue (which is publicly available as FAPEC-compressed CSVs). In this example, we decompress one of the files, get its CSV-formatted contents with Pandas, apply some filtering conditions, and generate a histogram.
from fapyc import Unfapyc
from io import BytesIO
import pandas as pd
import matplotlib.pyplot as plt
# Read the FAPEC file into a buffer
filename = input("Path to CSV-FAPEC file: ")
file = open(filename, "rb")
# Beware - we load the whole file to memory
data = file.read()
# Decompress the buffer
uf = Unfapyc(buffer = data)
uf.decompress()
# Regenerate the CSV from the bytes buffer
df = pd.read_csv(BytesIO(uf.outputBuffer), comment="#")
print("Info from the full CSV:")
print(df.info())
# Prepare some nice histograms for all data
plt.subplot(2,2,1)
plt.title("Full CSV: skymap (%d sources)" % df.shape[0])
plt.xlabel("RA")
plt.ylabel("DEC")
plt.hist2d(df.ra, df.dec, bins=(100, 100), cmap=plt.cm.jet)
plt.colorbar()
plt.subplot(2,2,2)
plt.title("Full CSV: G dist")
plt.xlabel("G magnitude")
plt.ylabel("Counts")
plt.yscale("log")
plt.hist(df.phot_g_mean_mag, bins=(50))
# Now let's repeat, but doing the histogram from only the values that fulfil
# some conditions on some of the CSV fields
iter_csv = pd.read_csv(BytesIO(uf.outputBuffer), comment="#", iterator=True, chunksize=1000)
df = pd.concat((x.query("ra_error < 0.1 & dec_error < 0.1 & ruwe > 0 & ruwe < 5") for x in iter_csv))
print("Info from the filtered CSV:")
print(df.info())
plt.subplot(2,2,3)
plt.title("Filtered CSV: skymap (%d sources)" % df.shape[0])
plt.xlabel("RA")
plt.ylabel("DEC")
plt.hist2d(df.ra, df.dec, bins=(100, 100), cmap=plt.cm.jet)
plt.colorbar()
plt.subplot(2,2,4)
plt.title("Filtered CSV: G dist")
plt.xlabel("G magnitude")
plt.ylabel("Counts")
plt.yscale("log")
plt.hist(df.phot_g_mean_mag, bins=(50))
plt.show()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file fapyc-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: fapyc-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 779.8 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98f95210046e687339aeda21a3eaad1b491c9e09bd1655a624b08eed88207a4d |
|
MD5 | bf8b9a098574cd73556820ce3bc91ff0 |
|
BLAKE2b-256 | 4eab596022a5921caa698d60dd300b0601f9e1ee25fe5660d4d591e418495b2c |
File details
Details for the file fapyc-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: fapyc-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 759.6 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 862831110a310d607122ee22484f35a564c1e1480304f33d2e0d1c8ab1069de5 |
|
MD5 | 880b96b696385d37778ebef191667d5b |
|
BLAKE2b-256 | b5527e3e950de73e024d073876504fbd165c294b3f7e67230475a8eeff837021 |
File details
Details for the file fapyc-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: fapyc-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 770.8 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b4b977ba9f843905d428a5f6ee25931ca0242b40e3d321fa3f63471ee22f171 |
|
MD5 | b4b1aa76aac55b6fe65f8a42ce0b7b7f |
|
BLAKE2b-256 | a5f3d33e6883862f551bd2ca76002accbd982e77d89aa23d2ab02877e40c6b8a |
File details
Details for the file fapyc-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: fapyc-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 772.3 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a269969c081e5d637ac83c4edc1d4643cd6a6e8e742d23afa6619bba8e5c03a |
|
MD5 | f8769bf4d73b06ae0a7ca47dab96770d |
|
BLAKE2b-256 | 98c4f758940f5dbabff90670263b37ba79543c8f6007c2768a70b4a6757ace8f |
File details
Details for the file fapyc-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: fapyc-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 748.2 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c0452101fbd0312629cce541c24e0d911f1402537d6f800e814f9f9eddb6c74 |
|
MD5 | fa5c339e152a4d204a8682268fe8d331 |
|
BLAKE2b-256 | a5796bd24226b3d1329781da3b8a22595b39ff408ee23b674de871e110d105b9 |
File details
Details for the file fapyc-0.2.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: fapyc-0.2.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 748.3 kB
- Tags: CPython 3.6m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6df3f294049ffad69cfb4be22ccfcf00368d00144fe7421b937b44680524064 |
|
MD5 | c5b5e1e71a8093315b8e7658a574420f |
|
BLAKE2b-256 | b64f2a7325c7ee5625e735faf19d41feb5abeecd01a30de5bb96331793948b84 |