A tool for compressing MS/MS data
Project description
MS/MS Data Compression Package
Description
This Python package is designed for efficient compression of Mass Spectrometry (MS/MS) data. It is based on the MassComp algorithm, which is described in the following paper: https://doi.org/10.1186/s12859-019-2962-7
Version
0.2.0
Features
- Delta and Hex Encoding: Efficiently encodes m/z values and intensities to optimize the compression.
- Brotli Compression: Utilizes Brotli, a high-performance compression algorithm, offering superior compression ratios and speeds compared to gzip.
Installation
To install the MS/MS Data Compression package, run:
pip install msms-compression
Usage
The package includes the following main compressor classes:
SpectrumCompressorUrl
: Utilizes URL-safe Base64 encoding.SpectrumCompressor
: Uses Base85 encoding.
- Note: The m/z values must be sorted in ascending order before compression, and contain only positive values.
Example:
from msms_compression import SpectrumCompressorF32
# Sample data
mz_values, intensity_values = [100.0, 101.0, 102.0], [10.0, 20.0, 30.0]
# Initialize the compressor
compressor = SpectrumCompressorF32()
# Compress data
compressed_data = compressor.compress(mz_values, intensity_values)
print("Compressed Data:", compressed_data)
# Decompress data
decompressed_mz, decompressed_intensity = compressor.decompress(compressed_data)
assert decompressed_mz == mz_values
assert decompressed_intensity == intensity_values
Compression Strategy Comparison
strategy | Compression Ratio | Compression Ratio Rank | URL Compression Ratio | URL Compression Ratio Rank | Compression Time | Compression Time Rank | Decompression Time | Decompression Time Rank |
---|---|---|---|---|---|---|---|---|
SpectrumCompressorLossy | 5.952 | 1 | 5.023 | 2 | 0.030 | 5 | 0.008 | 4 |
SpectrumCompressorUrlLossy | 5.579 | 2 | 6.926 | 1 | 0.030 | 4 | 0.007 | 1 |
SpectrumCompressor | 3.890 | 3 | 3.299 | 6 | 0.053 | 7 | 0.010 | 6 |
SpectrumCompressorUrl | 3.646 | 4 | 4.528 | 3 | 0.051 | 6 | 0.008 | 3 |
SpectrumCompressorGzip | 3.148 | 5 | 2.658 | 7 | 0.023 | 2 | 0.009 | 5 |
SpectrumCompressorUrlGzip | 2.951 | 6 | 3.665 | 4 | 0.022 | 1 | 0.007 | 2 |
SpectrumCompressorUrlLzstring | 2.800 | 7 | 3.418 | 5 | 0.026 | 3 | 0.097 | 7 |
scan | strategy | original_size | compressed_size | url_encoded_size | compression_ratio | url_compression_ratio | compressed_time | decompressed_time |
---|---|---|---|---|---|---|---|---|
0 | SpectrumCompressor | 56124 | 14428 | 21139 | 3.88993623509842 | 3.299068073229576 | 0.053049564361572266 | 0.009985208511352539 |
0 | SpectrumCompressorUrl | 56124 | 15392 | 15401 | 3.646309771309771 | 4.528212453736771 | 0.051015615463256836 | 0.00789642333984375 |
0 | SpectrumCompressorGzip | 56124 | 17829 | 26236 | 3.1479050984351336 | 2.6581414849824667 | 0.02299976348876953 | 0.009003639221191406 |
0 | SpectrumCompressorUrlGzip | 56124 | 19020 | 19029 | 2.950788643533123 | 3.664879920121919 | 0.021996021270751953 | 0.007005453109741211 |
0 | SpectrumCompressorUrlLzstring | 56124 | 20041 | 20402 | 2.8004590589291953 | 3.418243309479463 | 0.026098012924194336 | 0.09739089012145996 |
0 | SpectrumCompressorLossy | 56124 | 9429 | 13884 | 5.952274896595609 | 5.022976087582829 | 0.030114173889160156 | 0.007976055145263672 |
0 | SpectrumCompressorUrlLossy | 56124 | 10060 | 10069 | 5.578926441351888 | 6.9261098420895815 | 0.030014991760253906 | 0.006910562515258789 |
The method compresses intensity values into two-character hexadecimal strings, offering 256 unique representations. This is a lossy approach, effectively reducing data size. Meanwhile, m/z values are compressed losslessly using delta encoding, maintaining their exact accuracy.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file msms_compression-0.3.0.tar.gz
.
File metadata
- Download URL: msms_compression-0.3.0.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c4f02371bc41c6cb80dd6f0fc928031724860b3b5ac117cfd58b4f5259e6526 |
|
MD5 | d1d915049dfc34182579a63c7bf70fc1 |
|
BLAKE2b-256 | 8ea94e0aee5c1f9649f68ba40fac7a90b6b310f9b2dc9f5a3d639f98f6db3c88 |
File details
Details for the file msms_compression-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: msms_compression-0.3.0-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e29470d28c44e5082208230a3e1e26772ad855441495cef33468ada2353e2efc |
|
MD5 | 4a561b8a6d8a7c5e662a20e58b633568 |
|
BLAKE2b-256 | 4965ee2aa4da5ded4bb308f9af8368395b2874decd237d370cdbeb8eddb8d8ba |