Download and parse curated HDX-MS datasets
Project description
HDXMS Datasets
Welcome to the HDXMS datasets repository.
The hdxms-datasets package provides tools handling HDX-MS datasets.
The package offers the following features:
- Defining datasets and their experimental metadata
- Verification of datasets and metadata
- Loading datasets from local or remote database
- Conversion of datasets from various formats (e.g., DynamX, HDExaminer) to a standardized format
- Propagation of standard deviations from replicates to fractional relative uptake values
A database for open HDX datasets is set up at HDXMS DataBase
There is an example front-end available featuring real-time estimation of HDX-MS ΔG values called instaGibbs
Installation
pip install hdxms-datasets
Example Usage
Loading datasets
from hdxms_datasets import DataBase
db = DataBase('path/to/local_db')
dataset = db.get_dataset('HDX_D9096080')
# Protein identifier information
print(dataset.protein_identifiers.uniprot_entry_name)
#> 'SECB_ECOLI'
# Access HDX states
print([state.name for state in dataset.states])
#> ['Tetramer', 'Dimer']
# Get the sequence of the first state
state = dataset.states[0]
print(state.protein_state.sequence)
#> 'MSEQNNTEMTFQIQRIYT...'
# Load peptides
peptides = state.peptides[0]
# Access peptide information
print(peptides.deuteration_type, peptides.pH, peptides.temperature)
#> DeuterationType.partially_deuterated 8.0 303.15
# Load the peptide table as standardized narwhals DataFrame
df = peptides.load(
convert=True, # convert column header names to open hdx stanard
aggregate=True, # aggregate centroids / uptake values across replicates
)
print(df.columns)
#> ['start', 'end', 'sequence', 'state', 'exposure', 'centroid_mz', 'rt', 'rt_sd', 'uptake', ...
Define and process datasets
from hdxms_datasets import ProteinState, Peptides, verify_sequence, merge_peptides, compute_uptake_metrics
# Define the protein state
protein_state = ProteinState(
sequence="MSEQNNTEMTFQIQRIYTKDISFEAPNAPHVFQKDWQPEVKLDLDTASSQLADDVYEVVLRVTVTASLGEETAFLCEVQQGGIFSIAGIEGTQMAHCLGAYCPNILFPYARECITSMVSRGTFPQLNLAPVNFDALFMNYLQQQAGEGTEEHQDA",
n_term=1,
c_term=155,
oligomeric_state=4,
)
# Define the partially deuterated peptides for the SecB state
pd_peptides = Peptides(
# path to the data file
data_file=data_dir / "ecSecB_apo.csv",
# specify the data format
data_format=PeptideFormat.DynamX_v3_state,
# specify the deuteration type (partially, fully or not deuterated)
deuteration_type=DeuterationType.partially_deuterated,
filters={
"State": "SecB WT apo",
# Optionally filter by exposure, leave out to include all exposures
"Exposure": [0.167, 0.5, 1.0, 10.0, 100.000008],
},
# pH read without corrections
pH=8.0,
# temperature of the exchange buffer
temperature=303.15,
# deuterium percentage of the exchange buffer
d_percentage=90.0,
)
# check for difference between the protein state sequence and the peptide sequences
mismatches = verify_sequence(pd_peptides.load(), protein_state.sequence, n_term=protein_state.n_term)
print(mismatches)
#> [] # sequences match
# Define the fully deuterated peptides for the SecB state
fd_peptides = Peptides(
data_file=data_dir / "ecSecB_apo.csv",
data_format=PeptideFormat.DynamX_v3_state,
deuteration_type=DeuterationType.fully_deuterated,
filters={
"State": "Full deuteration control",
"Exposure": 0.167,
},
)
# merge both peptides together in a single dataframe
merged = merge_peptides([pd_peptides, fd_peptides])
print(merged.columns)
#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd']
# compute uptake metrics for the merged peptides
# this function computes uptake from centroid mass if not present
# as well as fractional uptake
processed = compute_uptake_metrics(merged)
print(processed.columns)
#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd', 'fractional_uptake', 'fractional_uptake_sd']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hdxms_datasets-0.3.2.tar.gz.
File metadata
- Download URL: hdxms_datasets-0.3.2.tar.gz
- Upload date:
- Size: 6.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34944a520fe49b0c8583fd26fcfd36a48e5fe9000c161f5b6f116ad93970897a
|
|
| MD5 |
6cc70efe1c325364ae50545c59737a91
|
|
| BLAKE2b-256 |
e4a5c75b711a6f889b67e96cc698c2669a8588591d3245dff23078fff455cb9e
|
File details
Details for the file hdxms_datasets-0.3.2-py3-none-any.whl.
File metadata
- Download URL: hdxms_datasets-0.3.2-py3-none-any.whl
- Upload date:
- Size: 81.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01f9b32771bc8c1aec5308c8a3df765cd24326f722dca2c6b8f88d9ec9a69451
|
|
| MD5 |
b7d3a4c0867b80bd4d39846a68b6219e
|
|
| BLAKE2b-256 |
5609a0bbbb7f4b23903d41265d19ef361264085815d3dabf759cd0a1a1254992
|