Skip to main content

Download and parse curated HDX-MS datasets

Project description

HDXMS Datasets

Welcome to the HDXMS datasets repository.

The hdxms-datasets package provides tools handling HDX-MS datasets.

The package offers the following features:

  • Defining datasets and their experimental metadata
  • Verification of datasets and metadata
  • Loading datasets from local or remote database
  • Conversion of datasets from various formats (e.g., DynamX, HDExaminer) to a standardized format
  • Propagation of standard deviations from replicates to fractional relative uptake values

A database for open HDX datasets is set up at HDXMS DataBase

There is an example front-end available featuring real-time estimation of HDX-MS ΔG values called instaGibbs

Installation

pip install hdxms-datasets

Example Usage

Loading datasets

from hdxms_datasets import DataBase

db = DataBase('path/to/local_db')
dataset = db.get_dataset('HDX_D9096080')

# Protein identifier information
print(dataset.protein_identifiers.uniprot_entry_name)
#> 'SECB_ECOLI'

# Access HDX states 
print([state.name for state in dataset.states])
#> ['Tetramer', 'Dimer']

# Get the sequence of the first state
state = dataset.states[0]
print(state.protein_state.sequence)
#> 'MSEQNNTEMTFQIQRIYT...'

# Load peptides
peptides = state.peptides[0]

# Access peptide information
print(peptides.deuteration_type, peptides.pH, peptides.temperature)
#> DeuterationType.partially_deuterated 8.0 303.15

# Load the peptide table as standardized narwhals DataFrame
df = peptides.load(
    convert=True,  # convert column header names to open hdx stanard
    aggregate=True, # aggregate centroids / uptake values across replicates
)

print(df.columns)
#> ['start', 'end', 'sequence', 'state', 'exposure', 'centroid_mz', 'rt', 'rt_sd', 'uptake', ... 

Define and process datasets

from hdxms_datasets import ProteinState, Peptides, verify_sequence, merge_peptides, compute_uptake_metrics

# Define the protein state
protein_state = ProteinState(
    sequence="MSEQNNTEMTFQIQRIYTKDISFEAPNAPHVFQKDWQPEVKLDLDTASSQLADDVYEVVLRVTVTASLGEETAFLCEVQQGGIFSIAGIEGTQMAHCLGAYCPNILFPYARECITSMVSRGTFPQLNLAPVNFDALFMNYLQQQAGEGTEEHQDA",
    n_term=1,
    c_term=155,
    oligomeric_state=4,
)

# Define the partially deuterated peptides for the SecB state
pd_peptides = Peptides(
    # path to the data file
    data_file=data_dir / "ecSecB_apo.csv",
    # specify the data format
    data_format=PeptideFormat.DynamX_v3_state,
    # specify the deuteration type (partially, fully or not deuterated)
    deuteration_type=DeuterationType.partially_deuterated,
    filters={
        "State": "SecB WT apo",
        # Optionally filter by exposure, leave out to include all exposures
        "Exposure": [0.167, 0.5, 1.0, 10.0, 100.000008],
    },
    # pH read without corrections
    pH=8.0,
    # temperature of the exchange buffer
    temperature=303.15,
    # deuterium percentage of the exchange buffer
    d_percentage=90.0,
)

# check for difference between the protein state sequence and the peptide sequences
mismatches = verify_sequence(pd_peptides.load(), protein_state.sequence, n_term=protein_state.n_term)
print(mismatches)
#> [] # sequences match

# Define the fully deuterated peptides for the SecB state
fd_peptides = Peptides(
    data_file=data_dir / "ecSecB_apo.csv",
    data_format=PeptideFormat.DynamX_v3_state,
    deuteration_type=DeuterationType.fully_deuterated,
    filters={
        "State": "Full deuteration control",
        "Exposure": 0.167,
    },
)

# merge both peptides together in a single dataframe
merged = merge_peptides([pd_peptides, fd_peptides])
print(merged.columns)
#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd']

# compute uptake metrics for the merged peptides
# this function computes uptake from centroid mass if not present
# as well as fractional uptake
processed = compute_uptake_metrics(merged)
print(processed.columns)
#> ['start', 'end', 'sequence', ... 'uptake', 'uptake_sd', 'fd_uptake', 'fd_uptake_sd', 'fractional_uptake', 'fractional_uptake_sd']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdxms_datasets-0.3.1.tar.gz (5.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hdxms_datasets-0.3.1-py3-none-any.whl (77.2 kB view details)

Uploaded Python 3

File details

Details for the file hdxms_datasets-0.3.1.tar.gz.

File metadata

  • Download URL: hdxms_datasets-0.3.1.tar.gz
  • Upload date:
  • Size: 5.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdxms_datasets-0.3.1.tar.gz
Algorithm Hash digest
SHA256 83ca22f0552f19b16b50cd3e039f1f2a9fb4b539dd4d87376337f9ee7833d875
MD5 2adb15eccaa399a4f1215760eeadf701
BLAKE2b-256 8cf08d9d924caf53726ea6db32ce4000c6fc48c757e01d3697501c0535831da1

See more details on using hashes here.

File details

Details for the file hdxms_datasets-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: hdxms_datasets-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 77.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdxms_datasets-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 caf19cc36c6f48e53312e61e9f29780f4d587bb1e21030179dd5f4e00a2f5674
MD5 5a505bdb79a4e5594b9bd5102ec79636
BLAKE2b-256 41dda1e05207f018d001d9a37e5dd3f949879a852502c6e834d50ed812bbc528

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page