Skip to main content

Download and parse curated HDX-MS datasets

Project description

HDXMS Datasets

Welcome to the HDXMS datasets repository.

The hdxms-datasets package provides tools handling HDX-MS datasets.

The package offers the following features:

  • Defining datasets and their experimental metadata
  • Verification of datasets and metadata
  • Loading datasets from local or remote database
  • Conversion of datasets from various formats (e.g., DynamX, HDExaminer) to a standardized format
  • Propagation of standard deviations from replicates to fractional relative uptake values

A database for open HDX datasets is set up at HDXMS DataBase

There is an example front-end available featuring real-time estimation of HDX-MS ΔG values called instaGibbs

Installation

pip install hdxms-datasets

Example Usage

Loading datasets

from hdxms_datasets import DataBase

db = DataBase('path/to/local_db')
dataset = db.get_dataset('HDX_D9096080')

# Protein identifier information
print(dataset.protein_identifiers.uniprot_entry_name)
#> 'SECB_ECOLI'

# Access HDX states 
print([state.name for state in dataset.states])
#> ['Tetramer', 'Dimer']

# Get the sequence of the first state
state = dataset.states[0]
print(state.protein_state.sequence)
#> 'MSEQNNTEMTFQIQRIYT...'

# Load peptides
peptides = state.peptides[0]

# Access peptide information
print(peptides.deuteration_type, peptides.pH, peptides.temperature)
#> DeuterationType.partially_deuterated 8.0 303.15

# Load the peptide table as standardized narwhals DataFrame
df = peptides.load(
    convert=True,  # convert column header names to open hdx stanard
    aggregate=True, # aggregate centroids / uptake values across replicates
)

print(df.columns)
#> ['start', 'end', 'sequence', 'state', 'exposure', 'centroid_mz', 'rt', 'rt_sd', 'uptake', ... 

Define and process datasets

from hdxms_datasets import ProteinState, Peptides, verify_sequence, merge_peptides, compute_uptake_metrics

# Define the protein state
protein_state = ProteinState(
    sequence="MSEQNNTEMTFQIQRIYTKDISFEAPNAPHVFQKDWQPEVKLDLDTASSQLADDVYEVVLRVTVTASLGEETAFLCEVQQGGIFSIAGIEGTQMAHCLGAYCPNILFPYARECITSMVSRGTFPQLNLAPVNFDALFMNYLQQQAGEGTEEHQDA",
    n_term=1,
    c_term=155,
    oligomeric_state=4,
)

# Define the partially deuterated peptides for the SecB state
pd_peptides = Peptides(
    # path to the data file
    data_file=data_dir / "ecSecB_apo.csv",
    # specify the data format
    data_format=PeptideFormat.DynamX_v3_state,
    # specify the deuteration type (partially, fully or not deuterated)
    deuteration_type=DeuterationType.partially_deuterated,
    filters={
        "State": "SecB WT apo",
        # Optionally filter by exposure, leave out to include all exposures
        "Exposure": [0.167, 0.5, 1.0, 10.0, 100.000008],
    },
    # pH read without corrections
    pH=8.0,
    # temperature of the exchange buffer
    temperature=303.15,
    # deuterium percentage of the exchange buffer
    d_percentage=90.0,
)

# check for difference between the protein state sequence and the peptide sequences
mismatches = verify_sequence(pd_peptides.load(), protein_state.sequence, n_term=protein_state.n_term)
print(mismatches)
#> [] # sequences match

# Define the fully deuterated peptides for the SecB state
fd_peptides = Peptides(
    data_file=data_dir / "ecSecB_apo.csv",
    data_format=PeptideFormat.DynamX_v3_state,
    deuteration_type=DeuterationType.fully_deuterated,
    filters={
        "State": "Full deuteration control",
        "Exposure": 0.167,
    },
)

# merge both peptides together in a single dataframe
merged = merge_peptides([pd_peptides, fd_peptides])
print(merged.columns)
#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd']

# compute uptake metrics for the merged peptides
# this function computes uptake from centroid mass if not present
# as well as fractional uptake
processed = compute_uptake_metrics(merged)
print(processed.columns)
#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd', 'fractional_uptake', 'fractional_uptake_sd']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdxms_datasets-0.3.2.tar.gz (6.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hdxms_datasets-0.3.2-py3-none-any.whl (81.9 kB view details)

Uploaded Python 3

File details

Details for the file hdxms_datasets-0.3.2.tar.gz.

File metadata

  • Download URL: hdxms_datasets-0.3.2.tar.gz
  • Upload date:
  • Size: 6.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdxms_datasets-0.3.2.tar.gz
Algorithm Hash digest
SHA256 34944a520fe49b0c8583fd26fcfd36a48e5fe9000c161f5b6f116ad93970897a
MD5 6cc70efe1c325364ae50545c59737a91
BLAKE2b-256 e4a5c75b711a6f889b67e96cc698c2669a8588591d3245dff23078fff455cb9e

See more details on using hashes here.

File details

Details for the file hdxms_datasets-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: hdxms_datasets-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 81.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdxms_datasets-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 01f9b32771bc8c1aec5308c8a3df765cd24326f722dca2c6b8f88d9ec9a69451
MD5 b7d3a4c0867b80bd4d39846a68b6219e
BLAKE2b-256 5609a0bbbb7f4b23903d41265d19ef361264085815d3dabf759cd0a1a1254992

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page