Skip to main content

A Python wrapper for the FAPEC data compressor.

Project description

FaPyc

A Python wrapper for the FAPEC data compressor. (C) DAPCOM Data Services S.L. - https://www.dapcom.es

The free decompression-only library is included, which has some limitations such as the maximum number of threads and the recovery of corrupted files. Only a 'dummy' compression library is provided: You can get free evaluation licenses at https://www.dapcom.es/get-fapec/ to test the compressor. For full licenses, please contact us at fapec@dapcom.es

Usage

There are 3 main execution modes:

  • File: When invoking Fapyc or Unfapyc on a filename, it will (de)compress it directly into another file.
  • Buffer: You can load the whole file to (de)compress on e.g. a byte array, and then invoke Fapyc/Unfapyc which will leave the result in the output buffer. Obviously, you should be careful with large/huge files!
  • File-to-buffer decompression: You can directly decompress a file (without having to load it beforehand) and leave its decompressed output in a buffer, which you can use afterwards.
  • Chunk: FAPEC internally works in 'chunks' of data, of up to 384MB each, which allows to progressively (de)compress a huge file while keeping memory usage under control. For now, this feature is only available in the FAPEC CLI and C API, not in Fapyc/Unfapyc yet.

Examples

Compress and decompress a file

In this example we use the kmall option of FAPEC, suitable for this kind of geomaritime data files from Kongsberg Maritime:

from fapyc import Fapyc, Unfapyc

filename = input("Path to KMALL file: ")

print("Preparing to compress %s" % (filename))
# Here we invoke FAPEC to directly run on files,
# so the memory usage will be much smaller (just 5MB or so)
# although it won't allow us to directly access the
# (de)compressed buffers.
f = Fapyc(filename, chunksize = 2048576, blen = 512)
f.compress_kmall()

print("Preparing to decompress %s" % (filename + ".fapec"))
uf = Unfapyc(filename + ".fapec")
uf.decompress(output=filename+".dec")

Compress and decompress a buffer

In this example we use the tab option of FAPEC, which typically outperforms gzip and bzip2 on tabulated text data:

from fapyc import Fapyc, Unfapyc

filename = input("Path to file: ")
file = open(filename, "rb")
# Beware - Load the whole file to memory
data = file.read()
f = Fapyc(buffer = data)
# Invoke our tabulated-text compression algorithm
# indicating a comma separator
f.compress_tabtxt(sep1=',')
print("Ratio =", round(float(len(data))/len(f.outputBuffer), 4))

# Now we decompress the buffer
uf = Unfapyc(buffer = f.outputBuffer)
uf.decompress()

Decompress a file into a buffer, and do some operations on it

Here we provide a quite specific use case, based on the Gaia (E)DR3 bulk catalogue (which is publicly available as FAPEC-compressed CSVs). In this example, we decompress one of the files, get its CSV-formatted contents with Pandas, apply some filtering conditions, and generate a histogram.

from fapyc import Unfapyc
from io import BytesIO
import pandas as pd
import matplotlib.pyplot as plt

# Read the FAPEC file into a buffer
filename = input("Path to CSV-FAPEC file: ")
file = open(filename, "rb")
# Beware - we load the whole file to memory
data = file.read()

# Decompress the buffer
uf = Unfapyc(buffer = data)
uf.decompress()

# Regenerate the CSV from the bytes buffer
df = pd.read_csv(BytesIO(uf.outputBuffer), comment="#")
print("Info from the full CSV:")
print(df.info())
# Prepare some nice histograms for all data
plt.subplot(2,2,1)
plt.title("Full CSV: skymap (%d sources)" % df.shape[0])
plt.xlabel("RA")
plt.ylabel("DEC")
plt.hist2d(df.ra, df.dec, bins=(100, 100), cmap=plt.cm.jet)
plt.colorbar()
plt.subplot(2,2,2)
plt.title("Full CSV: G dist")
plt.xlabel("G magnitude")
plt.ylabel("Counts")
plt.yscale("log")
plt.hist(df.phot_g_mean_mag, bins=(50))

# Now let's repeat, but doing the histogram from only the values that fulfil
# some conditions on some of the CSV fields
iter_csv = pd.read_csv(BytesIO(uf.outputBuffer), comment="#", iterator=True, chunksize=1000)
df = pd.concat((x.query("ra_error < 0.1 & dec_error < 0.1 & ruwe > 0 & ruwe < 5") for x in iter_csv))
print("Info from the filtered CSV:")
print(df.info())
plt.subplot(2,2,3)
plt.title("Filtered CSV: skymap (%d sources)" % df.shape[0])
plt.xlabel("RA")
plt.ylabel("DEC")
plt.hist2d(df.ra, df.dec, bins=(100, 100), cmap=plt.cm.jet)
plt.colorbar()
plt.subplot(2,2,4)
plt.title("Filtered CSV: G dist")
plt.xlabel("G magnitude")
plt.ylabel("Counts")
plt.yscale("log")
plt.hist(df.phot_g_mean_mag, bins=(50))

plt.show()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

fapyc-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (779.8 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

fapyc-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (759.6 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

fapyc-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (770.8 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

fapyc-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (772.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

fapyc-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (748.2 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

fapyc-0.2.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (748.3 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

File details

Details for the file fapyc-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fapyc-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 98f95210046e687339aeda21a3eaad1b491c9e09bd1655a624b08eed88207a4d
MD5 bf8b9a098574cd73556820ce3bc91ff0
BLAKE2b-256 4eab596022a5921caa698d60dd300b0601f9e1ee25fe5660d4d591e418495b2c

See more details on using hashes here.

File details

Details for the file fapyc-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fapyc-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 862831110a310d607122ee22484f35a564c1e1480304f33d2e0d1c8ab1069de5
MD5 880b96b696385d37778ebef191667d5b
BLAKE2b-256 b5527e3e950de73e024d073876504fbd165c294b3f7e67230475a8eeff837021

See more details on using hashes here.

File details

Details for the file fapyc-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fapyc-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4b4b977ba9f843905d428a5f6ee25931ca0242b40e3d321fa3f63471ee22f171
MD5 b4b1aa76aac55b6fe65f8a42ce0b7b7f
BLAKE2b-256 a5f3d33e6883862f551bd2ca76002accbd982e77d89aa23d2ab02877e40c6b8a

See more details on using hashes here.

File details

Details for the file fapyc-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fapyc-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5a269969c081e5d637ac83c4edc1d4643cd6a6e8e742d23afa6619bba8e5c03a
MD5 f8769bf4d73b06ae0a7ca47dab96770d
BLAKE2b-256 98c4f758940f5dbabff90670263b37ba79543c8f6007c2768a70b4a6757ace8f

See more details on using hashes here.

File details

Details for the file fapyc-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fapyc-0.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8c0452101fbd0312629cce541c24e0d911f1402537d6f800e814f9f9eddb6c74
MD5 fa5c339e152a4d204a8682268fe8d331
BLAKE2b-256 a5796bd24226b3d1329781da3b8a22595b39ff408ee23b674de871e110d105b9

See more details on using hashes here.

File details

Details for the file fapyc-0.2.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fapyc-0.2.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a6df3f294049ffad69cfb4be22ccfcf00368d00144fe7421b937b44680524064
MD5 c5b5e1e71a8093315b8e7658a574420f
BLAKE2b-256 b64f2a7325c7ee5625e735faf19d41feb5abeecd01a30de5bb96331793948b84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page