Helper utilities for SDMC ad-hoc data processing requests.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

sdmc tools

This package contains a collection of functions designed for the standard cleaning and processing of assay data by SDMC before the data is shared with stats.

These include

methods and functions for standardizing a dataset and merging on ldms
methods for pulling ldms data from Delphi
command line tools for creating and compiling a data dictionary (.xlsx) and documentation (.md + .html)

Installation

If you would like to use the access_ldms module, you will need to first run the following:

sudo apt update
sudo apt install libpq-dev

This installs the libpq C library that psycopg requires. If you do not need to use the access_ldms methods, then you can skip this. Note that trying to import sdmc_tools.access_ldms will throw errors.

After doing this, the package can be installed using pip: pip install sdmc-tools.

python >= 3.8 is required. these functions might break with earlier python versions.
The following packages are depenencies:
- docutils
- pandas
- numpy
- PyYAML
- typing
- datetime
- openpyxl
- xlsxwriter
- psycopg

Usage

Pulling ldms data

Python functions for connecting to Delphi and pulling LDMS data.

You will need to save a config file to the filepath ~/.config/sdmc-tools/config.yaml. Do NOT add this to a git repo, as it will contain a plain-text password. Populate the file with:


username: 'MY_DELPHI/HUTCH_USERNAME'
password: 'MY_DELPHI_PW'

The available methods include:

pull_one_protocol:

import sdmc_tools.access_ldms as access_ldms

ldms_hvtn = access_ldms.pull_one_protocol('hvtn', 130)
ldms_covpn = acess_ldms.pull_one_protocol('covpn', 3008)

pull_multiple_protocols:

import sdmc_tools.access_ldms as access_ldms

ldms_hvtn_130_140 = access_ldms.pull_multiple_protocols('hvtn', [130, 140])
ldms_covpn_3008_5001 = acess_ldms.pull_multiple_protocols('covpn', [3008, 5001])

ldms_hvtn = access_ldms.pull_multiple_protocols('hvtn', 'all') # pull ldms for all hvtn protocols. this will take longer.

Data processing

Python functions and constants for data processing / prep.

The primary function is standard_processing:

import sdmc_tools.process as sdmc

outputs = sdmc.standard_processing(
    input_data = input_data,
    input_data_path="/path/to/input_data.xlsx", 
    guspec_col='guspec', 
    network='hvtn', 
    metadata_dict=hand_appended_metadata, 
    ldms=ldms 
)

To see the function signature and documentation, you can run ? sdmc.standard_processing in a Python interpreter. Given input_data, the function does the following:

merges on ldms, renames columns with standard labels
adds a spectype column
adds a drawdt column, drops drawdm, drawdd, drawdy
for each (key,value) in the metadata dict creates a column of the name 'key' with values 'value'
standardizes the 'ptid' and 'protocol' columns to be int-formatted strings
merges on columns pertaining to sdmc processing
rearranges columns into a standardized order
converts column names "From This" -> "to_this" format

See https://github.com/beatrixh/sdmc-tools/blob/main/src/sdmc_tools/constants.py for the list of constants accessible.

A usage example is included below.

import pandas as pd
import sdmc_tools.process as sdmc # this contains the main data processing utilities
import sdmc_tools.access_ldms as access_ldms

ldms = access_ldms.pull_one_protocol('hvtn', 302)

ldms

input_data

hand_appended_metadata = {
    'network': 'HVTN',
    'upload_lab_id': 'N4',
    'assay_lab_name': 'Name of Lab Here',
    'instrument': 'SpectraMax',
    'assay_type': 'Neutralizing Antibody (NAb)',
    'specrole': 'Sample',
}

outputs = sdmc.standard_processing(
    input_data = input_data, #a pandas dataframe containing input data
    input_data_path="/path/to/input_data.xlsx", #the path to the original input data
    guspec_col='guspec', #the name of the column containing guspecs within the input data
    network='hvtn', #the relevant network ('hvtn' or 'covpn')
    metadata_dict=hand_appended_metadata, #a dictionary of additional data to append as columns
    ldms=ldms #a pandas dataframe containing the ldms columns we want to merge from
)

outputs

Data dictionary creation

This is a command line tool; it creates a data dictionary for a set of processed outputs.

gen-data-dict takes two positional arguments:

the filepath where the outputs are stored,
and the desired name of the resulting data dict.

gen-data-dict /path/to/outputs.txt name_of_dictionary.xlsx

If the dictionary does not already exist in the directory where the outputs live, it will then create

an xlsx sheet in the same directory as the outputs, with a row for each variable in the outputs, and corresponding definitions for the standard vars. The variables unique to the specific outputs will need to be hand-edited.
a .txt log in the same directory with notes about any non-standard variables that have been included, or any standard variables that have been omitted.

If a dictionary of the given name already exists, it will be updated to reflect the variables in the output sheet, and the log will note the diff.

README creation

This is a command line tool; given a set of processed outputs, it creates a .md file with documentation for how the outputs were created, and a correspdonding .html of the compiled .md.

gen-readme takes one positional arguments:

the filepath to the paths.yaml from which it pulls the input and output filepaths

gen-data-dict /path/to/paths.yaml

It will then create

a markdown file describing how the outputs were created, including notes of where the inputs are saved. Note that it will assume the processing was standard, so this will need to be corrected for any nonstandard processing. It will search the output directory for the processed data outputs, a pivot summary of the samples, and the processing code. If it doesn't find these there, it will not include notes on these in the markdown.
an html file created via compiling the above markdown

regen-readme takes two positional arguments:

a filepath to the markdown to compile
the filepath to the data dictionary it should pull in. Eg., /path/to/data_dict.xlsx.

regen-readme /path/to/my_markdown.md /path/to/data_dict.xlsx

It will then compile into an html file in the same directory and of the same name. If such an html file already exists, it will be overwritten.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.6

Feb 18, 2026

0.0.2

Jul 3, 2024

0.0.1

May 8, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdmc_tools-0.0.6.tar.gz (160.3 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sdmc_tools-0.0.6-py3-none-any.whl (163.3 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file sdmc_tools-0.0.6.tar.gz.

File metadata

Download URL: sdmc_tools-0.0.6.tar.gz
Upload date: Feb 18, 2026
Size: 160.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for sdmc_tools-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`0b072dd5a37b1d949d23f90fabbe4c7fc2ab7ce0c7a23d4184bc9ae9ecc7cfcd`
MD5	`cade668ce545dba7663f11db94e87742`
BLAKE2b-256	`740748a09d8271b1e122ce0d02e06f853bce7a57b70734a92475b241ad474dfa`

See more details on using hashes here.

File details

Details for the file sdmc_tools-0.0.6-py3-none-any.whl.

File metadata

Download URL: sdmc_tools-0.0.6-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 163.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for sdmc_tools-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`16859a24c4ad03cbb978518666946cfeccba3f4505f926a724db87e203d21620`
MD5	`9834bc40cb35dc82804f7a1b6e3d66c0`
BLAKE2b-256	`68504878e3213f1783b156a297d52d0d9b4ec2698afae1cfc467d30f89fc890c`

See more details on using hashes here.

sdmc-tools 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sdmc tools

Installation

Usage

Pulling ldms data

Data processing

Data dictionary creation

README creation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes