Skip to main content

Download and parse curated HDX-MS datasets

Project description

HDXMS Datasets

Welcome to the HDXMS datasets repository.

The hdxms-datasets package provides tools handling HDX-MS datasets.

The package offers the following features:

  • Defining datasets and their experimental metadata
  • Verification of datasets and metadata
  • Loading datasets from local or remote database
  • Conversion of datasets from various formats (e.g., DynamX, HDExaminer) to a standardized format
  • Propagation of standard deviations from replicates to fractional relative uptake values

A database for open HDX datasets is set up at HDXMS DataBase

There is an example front-end available featuring real-time estimation of HDX-MS ΔG values called instaGibbs

Installation

pip install hdxms-datasets

Example Usage

Loading datasets

from hdxms_datasets import DataBase

db = DataBase('path/to/local_db')
dataset = db.get_dataset('HDX_D9096080')

# Protein identifier information
print(dataset.protein_identifiers.uniprot_entry_name)
#> 'SECB_ECOLI'

# Access HDX states 
print([state.name for state in dataset.states])
#> ['Tetramer', 'Dimer']

# Get the sequence of the first state
state = dataset.states[0]
print(state.protein_state.sequence)
#> 'MSEQNNTEMTFQIQRIYT...'

# Load peptides
peptides = state.peptides[0]

# Access peptide information
print(peptides.deuteration_type, peptides.pH, peptides.temperature)
#> DeuterationType.partially_deuterated 8.0 303.15

# Load the peptide table as standardized narwhals DataFrame
df = peptides.load(
    convert=True,  # convert column header names to open hdx stanard
    aggregate=True, # aggregate centroids / uptake values across replicates
)

print(df.columns)
#> ['start', 'end', 'sequence', 'state', 'exposure', 'centroid_mz', 'rt', 'rt_sd', 'uptake', ... 

Define and process datasets

from hdxms_datasets import ProteinState, Peptides, verify_sequence, merge_peptides, compute_uptake_metrics

# Define the protein state
protein_state = ProteinState(
    sequence="MSEQNNTEMTFQIQRIYTKDISFEAPNAPHVFQKDWQPEVKLDLDTASSQLADDVYEVVLRVTVTASLGEETAFLCEVQQGGIFSIAGIEGTQMAHCLGAYCPNILFPYARECITSMVSRGTFPQLNLAPVNFDALFMNYLQQQAGEGTEEHQDA",
    n_term=1,
    c_term=155,
    oligomeric_state=4,
)

# Define the partially deuterated peptides for the SecB state
pd_peptides = Peptides(
    # path to the data file
    data_file=data_dir / "ecSecB_apo.csv",
    # specify the data format
    data_format=PeptideFormat.DynamX_v3_state,
    # specify the deuteration type (partially, fully or not deuterated)
    deuteration_type=DeuterationType.partially_deuterated,
    filters={
        "State": "SecB WT apo",
        # Optionally filter by exposure, leave out to include all exposures
        "Exposure": [0.167, 0.5, 1.0, 10.0, 100.000008],
    },
    # pH read without corrections
    pH=8.0,
    # temperature of the exchange buffer
    temperature=303.15,
    # deuterium percentage of the exchange buffer
    d_percentage=90.0,
)

# check for difference between the protein state sequence and the peptide sequences
mismatches = verify_sequence(pd_peptides.load(), protein_state.sequence, n_term=protein_state.n_term)
print(mismatches)
#> [] # sequences match

# Define the fully deuterated peptides for the SecB state
fd_peptides = Peptides(
    data_file=data_dir / "ecSecB_apo.csv",
    data_format=PeptideFormat.DynamX_v3_state,
    deuteration_type=DeuterationType.fully_deuterated,
    filters={
        "State": "Full deuteration control",
        "Exposure": 0.167,
    },
)

# merge both peptides together in a single dataframe
merged = merge_peptides([pd_peptides, fd_peptides])
print(merged.columns)
#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd']

# compute uptake metrics for the merged peptides
# this function computes uptake from centroid mass if not present
# as well as fractional uptake
processed = compute_uptake_metrics(merged)
print(processed.columns)
#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd', 'fractional_uptake', 'fractional_uptake_sd']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdxms_datasets-0.3.3.tar.gz (6.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hdxms_datasets-0.3.3-py3-none-any.whl (82.0 kB view details)

Uploaded Python 3

File details

Details for the file hdxms_datasets-0.3.3.tar.gz.

File metadata

  • Download URL: hdxms_datasets-0.3.3.tar.gz
  • Upload date:
  • Size: 6.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdxms_datasets-0.3.3.tar.gz
Algorithm Hash digest
SHA256 d5b0dfa9c79615e8ef06a3e9567530d6978f13b9bc73be8cb76d98970e014440
MD5 8a827dae869d9a0d9b628c8313c2968e
BLAKE2b-256 07a17ea1a9f7e9069e87a11e3265b2774cdf859654cdb9a1f64d8a6da91e8c90

See more details on using hashes here.

File details

Details for the file hdxms_datasets-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: hdxms_datasets-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 82.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdxms_datasets-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9fa7e8eb13df9d84a6dc499a93a1f5505004f8a13f769bb7538bae3b31d963d4
MD5 150e66d80d291e156cad63934223336e
BLAKE2b-256 b1536db92c0722b7bf449721772c58a37bf45994d228e16d7497d8cce05d63b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page