Functions to determine polymer groups within datasets using kendrick mass defect and mass remainder analysis.
Project description
'cwest-polymer' Polymer Analysis Package
This python package used for reading, analyzing, and interpretting polymer species within mass spectrometry data using fractional mass remainder (fmr), a generalized kendrick mass defect (KMD) algorithm. This is done using circular distance metrics with cluster analysis to classify polymer groups more efficiently.
cwest-polymer (Welsh for "polymer quest") uses the data science python packagepiblin (Welsh for
"pipeline"), which is able to comprehensively capture analytical data from a variety
of sources along with their metadata. This packages shows basic implementations of file readers and transforms for
polymer analysis, but can be extended to include more complex data processing and analysis pipelines. Further examples
of piblin implementations can be found in the hermes rheo a rheological data analysis package.
More details on these concepts can be found in the reprint below. This includes literature references to KMD for futher background as well.
ASMS 2025 Poster Reprint: Improvements and Analysis of KMD
Fractional Mass Remainder (fMR) Transforms and Clustering Results
Polymer analysis was done on theoretical data, which was imported from .csv files. The figures were then generated by applying data transformation pipelines described below:
Theoretical PEG Cluster Analysis: data folder here
PEG mass values were calculated based on theoretical repeat unit and arbitrary end-groups. Transformed with scripts shown below, colored by grouping.
Lipids Clusted by Alkanes (CH2) / alkenes (C2H2): data folder here
Fatty acid (FA) mass values were calculated by molecular formula. Transformed with scripts shown below, colored by grouping.
Example of Network Graph + Spectrum Result
An example of generated interactive plots can be downloaded here
Package Parameters
Default repeat units based on package parameters.
DEFAULT_REPEAT_UNITS = {
"PEG": "C2 H4 O",
"PPG": "C3 H6 O",
"PTHF": "C4 H8 O",
"PET": "C10 H8 O4",
"PE": "C2 H4",
"PP": "C3 H6",
"Perfluoro": "C F2",
"PDMS": "C2 H6 Si O",
"BPA": "C18 H20 O3",
"Acrylamide": "C3 H5 N O",
"Acrylic acid": "C3 H4 O2",
"Nylon 6 6": "C12 H22 N2 O2",
}
Create transforms with different repeat units, parsed by molmass or simply float values.
The repeat_unit parameter can be a list or dictionary to supply different repeat units for the transformation
pipeline. The list or dictionary values can be a combination of formulas (str) or mass values (float). Dictionaries
enable custom repeat unit labels.
Fractional values, fractional, default to 1 for single charged ions, but can be a list of integers for multiply
charged species. The default_list parameter adds all repeat units from DEFAULT_REPEAT_UNITS with the supplies values.
from cwest_polymer import transforms
fmr_transform1 = transforms.FractionalMRTransform.create(repeat_units=['C1 H2 O3'], fractional=1, default_list=True)
fmr_transform2 = transforms.FractionalMRTransform.create(repeat_units=[123.45, 67.89], fractional=[1,2,3], default_list=False, kmd=True)
The following headers can be detected within a given spreadsheet (.csv and .xlsx)
Column headers are read using cwest_polymer.fmr_filereaders.fmr_mass_spreadsheet_reader.MassSpreadsheetReader class. Custom column fields can be used to modify the reader and an example of such is shown here
ACCEPTED_COLUMN_HEADERS = ['mass', 'mz', 'm/z', 'rt', 'retention time', 'abundance', 'intensity', 'area', 'x_pos', 'y_pos']
Implementing fractional mass-remainder (fMR) polymer detection algorithm
Python imports of cwest_polymer and piblin for file reading, transform set-up and following data transform
from cwest_polymer import MassSpreadsheetReader, transforms
from cwest_polymer import fmr_parameters as p
from pathlib import Path
import os
import pandas as pd
import numpy as np
Set parameters
spreadsheet_path = r"PATH/TO/DATA/FOLDER"
result_path = r"PATH/TO/RESULT/FOLDER"
ppm_tol = 10
mz_tol = 0.005
min_samples = 3
repeat_units = {
'alkanes': 'C H2',
'alkenes': 'C2 H2'
}
Read directories with the spreadsheet data file reader (.csv and .xlsx files)
data = MassSpreadsheetReader().data_from_filepath(filepath=path)
Create transform classes to calculate fMR values and determine polymer clusters
# transform to fMR
calc_fmr = transforms.FractionalMRTransform.create(repeat_units=repeat_units, fractional_values=1, default_list=False)
# cluster based on fMR
cluster_data = transforms.ClusterTransform.create(mz_tol=mz_tol, ppm_tol=ppm_tol, min_samples=min_samples)
# filter by group size
cluster_filter = transforms.FilterByClusterSize.create(min_samples=min_samples)
# create transformation pipeline (thus the name piblin)
pipeline = fmr_calculation + fmr_clusters + cluster_filter
Apply transforms to data
# run pipeline on data
fmr_clusters = pipeline(data)
Polymer Networkgraph
Download example of generated interactive plots
# imports
from cwest_polymer import PolyGraph
# run pipeline on data -> polymer groupings
pipeline = fmr_calculation + fmr_clusters
result = pipeline(data)
# create polymer network graph object and add measurements
fmr_graph = PolyGraph()
fmr_graph.add_measurements(result.measurements)
# generate interactive plotly figure
fig1 = fmr_graph.plot_graph_with_plotly()
fig2 = fmr_graph.plot_spectrum_with_plotly()
# save interactive graph as html file
fig1.write_html(os.path.join(result_path, 'fmr_graph.html'))
Creating results files
- Export .csv results
- Generate .png fmr plots
results = fmr_clusters.split_by_condition_name('file_name')
for result in results:
for measurement in result.measurements:
if measurement.datasets[0].number_of_points() == 0:
continue
ru_name = measurement.details['repeat_unit_information'][0]
# export data to .csv file
df = pd.DataFrame(np.array(measurement.datasets[0].data_arrays).T, columns=measurement.datasets[0].data_array_names)
df.to_csv(os.path.join(result_path, f'result_{name}_{ru_name}_filtered.csv'))
# generate fmr plot figure
fig, _ = measurement.visualize()
fig.savefig(os.path.join(result_path, f'plot_{name}_{ru_name}.png'), dpi=1000, bbox_inches='tight')
Calculate Polymer parameters: Mn, Mw, End-groups, etc.
# import
import pandas as pd
# Create transform
calculate_polymers = transforms.CalculatePolymerGroups.create()
# apply and create dataframe of results
result = transform(data)
df = pd.DataFrame.from_dict(result, orient='index')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cwest_polymer-1.0.1.tar.gz.
File metadata
- Download URL: cwest_polymer-1.0.1.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02010f185c08a215ed232976400b3aff7bd5e36d62fbf51afac98ceddae1b105
|
|
| MD5 |
4b1a2f019627574ef2648dcb9ef0e532
|
|
| BLAKE2b-256 |
328d57cdac1968870d300b2df6f9f638916455894ddbe8168a7fd647e16f0ea4
|
File details
Details for the file cwest_polymer-1.0.1-py3-none-any.whl.
File metadata
- Download URL: cwest_polymer-1.0.1-py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5076b57ee58f80652da7831d459ed78d624acdb9acf5756ca47f985ecdae6ae5
|
|
| MD5 |
b7d097b56dd13f7ab13eacb605b82295
|
|
| BLAKE2b-256 |
8bce34d5c4bc92276510abe6b98c75bfa0b2d24ab8b147dcecc78f5c0644057f
|