Skip to main content

A tool for compressing MS/MS data

Project description

MS/MS Data Compression Package

Description

This Python package is designed for efficient compression of Mass Spectrometry (MS/MS) data. It is based on the MassComp algorithm, which is described in the following paper: https://doi.org/10.1186/s12859-019-2962-7

Version

0.2.0

Features

  • Delta and Hex Encoding: Efficiently encodes m/z values and intensities to optimize the compression.
  • Brotli Compression: Utilizes Brotli, a high-performance compression algorithm, offering superior compression ratios and speeds compared to gzip.

Installation

To install the MS/MS Data Compression package, run:

pip install msms-compression

Usage

The package includes the following main compressor classes:

  • SpectrumCompressorUrl: Utilizes URL-safe Base64 encoding.
  • SpectrumCompressor: Uses Base85 encoding.
  • Note: The m/z values must be sorted in ascending order before compression, and contain only positive values.

Example:

from msms_compression import SpectrumCompressorF32

# Sample data
mz_values, intensity_values = [100.0, 101.0, 102.0], [10.0, 20.0, 30.0]

# Initialize the compressor
compressor = SpectrumCompressorF32()

# Compress data
compressed_data = compressor.compress(mz_values, intensity_values)
print("Compressed Data:", compressed_data)

# Decompress data
decompressed_mz, decompressed_intensity = compressor.decompress(compressed_data)
assert decompressed_mz == mz_values
assert decompressed_intensity == intensity_values

Compression Strategy Comparison

strategy Compression Ratio Compression Ratio Rank URL Compression Ratio URL Compression Ratio Rank Compression Time Compression Time Rank Decompression Time Decompression Time Rank
SpectrumCompressorLossy 5.952 1 5.023 2 0.030 5 0.008 4
SpectrumCompressorUrlLossy 5.579 2 6.926 1 0.030 4 0.007 1
SpectrumCompressor 3.890 3 3.299 6 0.053 7 0.010 6
SpectrumCompressorUrl 3.646 4 4.528 3 0.051 6 0.008 3
SpectrumCompressorGzip 3.148 5 2.658 7 0.023 2 0.009 5
SpectrumCompressorUrlGzip 2.951 6 3.665 4 0.022 1 0.007 2
SpectrumCompressorUrlLzstring 2.800 7 3.418 5 0.026 3 0.097 7
scan strategy original_size compressed_size url_encoded_size compression_ratio url_compression_ratio compressed_time decompressed_time
0 SpectrumCompressor 56124 14428 21139 3.88993623509842 3.299068073229576 0.053049564361572266 0.009985208511352539
0 SpectrumCompressorUrl 56124 15392 15401 3.646309771309771 4.528212453736771 0.051015615463256836 0.00789642333984375
0 SpectrumCompressorGzip 56124 17829 26236 3.1479050984351336 2.6581414849824667 0.02299976348876953 0.009003639221191406
0 SpectrumCompressorUrlGzip 56124 19020 19029 2.950788643533123 3.664879920121919 0.021996021270751953 0.007005453109741211
0 SpectrumCompressorUrlLzstring 56124 20041 20402 2.8004590589291953 3.418243309479463 0.026098012924194336 0.09739089012145996
0 SpectrumCompressorLossy 56124 9429 13884 5.952274896595609 5.022976087582829 0.030114173889160156 0.007976055145263672
0 SpectrumCompressorUrlLossy 56124 10060 10069 5.578926441351888 6.9261098420895815 0.030014991760253906 0.006910562515258789

The method compresses intensity values into two-character hexadecimal strings, offering 256 unique representations. This is a lossy approach, effectively reducing data size. Meanwhile, m/z values are compressed losslessly using delta encoding, maintaining their exact accuracy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msms_compression-0.3.0.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

msms_compression-0.3.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file msms_compression-0.3.0.tar.gz.

File metadata

  • Download URL: msms_compression-0.3.0.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for msms_compression-0.3.0.tar.gz
Algorithm Hash digest
SHA256 8c4f02371bc41c6cb80dd6f0fc928031724860b3b5ac117cfd58b4f5259e6526
MD5 d1d915049dfc34182579a63c7bf70fc1
BLAKE2b-256 8ea94e0aee5c1f9649f68ba40fac7a90b6b310f9b2dc9f5a3d639f98f6db3c88

See more details on using hashes here.

File details

Details for the file msms_compression-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for msms_compression-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e29470d28c44e5082208230a3e1e26772ad855441495cef33468ada2353e2efc
MD5 4a561b8a6d8a7c5e662a20e58b633568
BLAKE2b-256 4965ee2aa4da5ded4bb308f9af8368395b2874decd237d370cdbeb8eddb8d8ba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page