Skip to main content

No project description provided

Project description

QFC - Quantized Fourier Compression of Timeseries Data with Application to Electrophysiology

Overview

With the increasing sizes of data for extracellular electrophysiology, it is crucial to develop efficient methods for compressing multi-channel time series data. While lossless methods are desirable for perfectly preserving the original signal, the compression ratios for these methods usually range only from 2-4x. What is needed are ratios on the order of 10-30x, leading us to consider lossy methods.

Here, we implement a simple lossy compression method, inspired by the Discrete Cosine Transform (DCT) and the quantization steps of JPEG compression for images. The method comprises the following steps:

  • Compute the Discrete Fourier Transform (DFT) of the time series data in the time domain.
  • Quantize the Fourier coefficients to achieve a target entropy (the entropy determines the theoretically achievable compression ratio). This is done by multiplying by a normalization factor and then rounding to the nearest integer.
  • Compress the reduced-entropy quantized Fourier coefficients using zlib or zstd (other methods could be used instead).

To decompress:

  • Decompress the quantized Fourier coefficients.
  • Divide by the normalization factor.
  • Compute the Inverse Discrete Fourier Transform (IDFT) to obtain the reconstructed time series data.

This method is particularly well-suited for data that has been bandpass-filtered, as the suppressed Fourier coefficients yield an especially low entropy of the quantized signal.

For a comparison of various lossy and lossless compression schemes, see Compression strategies for large-scale electrophysiology data, Buccino et al..

Installation

pip install qfc

Example usage

# See examples/example1.py

from matplotlib import pyplot as plt
import numpy as np
from qfc import qfc_estimate_quant_scale_factor
from qfc.codecs import QFCCodec


def main():
    sampling_frequency = 30000
    duration = 2
    num_channels = 10
    num_samples = int(sampling_frequency * duration)
    y = np.random.randn(num_samples, num_channels) * 50
    y = lowpass_filter(y, sampling_frequency, 6000)
    y = np.ascontiguousarray(y)  # compressor requires C-order arrays
    y = y.astype(np.int16)
    target_residual_stdev = 5

    ############################################################
    quant_scale_factor = qfc_estimate_quant_scale_factor(
        y,
        target_residual_stdev=target_residual_stdev
    )
    codec = QFCCodec(
        quant_scale_factor=quant_scale_factor,
        dtype="int16",
        segment_length=10000,
        compression_method="zstd",
        zstd_level=3
    )
    compressed_bytes = codec.encode(y)
    y_reconstructed = codec.decode(compressed_bytes)
    ############################################################

    y_resid = y - y_reconstructed
    original_size = y.nbytes
    compressed_size = len(compressed_bytes)
    compression_ratio = original_size / compressed_size
    print(f"Original size: {original_size} bytes")
    print(f"Compressed size: {compressed_size} bytes")
    print(f"Actual compression ratio: {compression_ratio}")
    print(f'Target residual std. dev.: {target_residual_stdev:.2f}')
    print(f'Actual Std. dev. of residual: {np.std(y_resid):.2f}')

    xgrid = np.arange(y.shape[0]) / sampling_frequency
    ch = 3  # select a channel to plot
    n = 1000  # number of samples to plot
    plt.figure()
    plt.plot(xgrid[:n], y[:n, ch], label="Original")
    plt.plot(xgrid[:n], y_reconstructed[:n, ch], label="Decompressed")
    plt.plot(xgrid[:n], y_resid[:n, ch], label="Residual")
    plt.xlabel("Time")
    plt.title(f'QFC compression ratio: {compression_ratio:.2f}')
    plt.legend()
    plt.show()


def lowpass_filter(input_array, sampling_frequency, cutoff_frequency):
    F = np.fft.fft(input_array, axis=0)
    N = input_array.shape[0]
    freqs = np.fft.fftfreq(N, d=1 / sampling_frequency)
    sigma = cutoff_frequency / 3
    window = np.exp(-np.square(freqs) / (2 * sigma**2))
    F_filtered = F * window[:, None]
    filtered_array = np.fft.ifft(F_filtered, axis=0)
    return np.real(filtered_array)


if __name__ == "__main__":
    main()

Zarr example

See examples/zarr_example.py

Benchmarks

I have put together some preliminary systematic benchmarks on real and synthetic data. See ./benchmarks and ./benchmarks/results.

As can be seen:

  • Quantizing in the Fourier domain (QFC) is a lot better than quantizing in the time domain (call it QTC) for real data or for bandpass-filtered data.
  • The compression ratio is a lot better for bandpass-filtered data compared with unfiltered raw.
  • For the lossless part of the method, zstd is better than zlib, both in terms of all three of these factors: compression ratio, compression speed, and decompression speed.
  • Obviously, the compression ratio is going to depend heavily on the target residual std. dev.

License

This code is provided under the Apache License, Version 2.0.

Author

Jeremy Magland, Center for Computational Mathematics, Flatiron Institute

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qfc-0.3.6.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

qfc-0.3.6-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file qfc-0.3.6.tar.gz.

File metadata

  • Download URL: qfc-0.3.6.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.9.18 Linux/6.5.6-76060506-generic

File hashes

Hashes for qfc-0.3.6.tar.gz
Algorithm Hash digest
SHA256 78ca115b9f208a68de9451c851889061f5147efa25f48ae6fbf8d541dc75ce47
MD5 43153bbc7fca05e084f9771981f13a7d
BLAKE2b-256 6537ea0fea0cab9312d9a4e033cccd7b525684034c00d64d080ba6b1c6c24347

See more details on using hashes here.

File details

Details for the file qfc-0.3.6-py3-none-any.whl.

File metadata

  • Download URL: qfc-0.3.6-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.9.18 Linux/6.5.6-76060506-generic

File hashes

Hashes for qfc-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 8dc896892bc31badb74dea6cf4958c6f7439b4f6373f13f99bc1da41bdb4a041
MD5 875b1a26a1e1b3ffd62fb1b22bef6ee2
BLAKE2b-256 1ce8baa11c86f21450b2080c9ed011cd8fafc47a1ec054345eea5d8d4accc71d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page