Skip to main content

The 'omopsurvey' Python package transforms standardized response codes from the OMOP CDM survey variables into numeric values and simplifies data preparation by providing functionalities for mapping and converting response codes, as well as handling missing data, facilitating easier and more reliable data analysis.

Project description

The 'omopsurvey' Python package offers a comprehensive solution for transforming standardized response codes from the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) survey variables into numeric values, streamlining the data preparation process. By automating the mapping and conversion of response codes, as well as the handling of missing data, it makes data analysis more accessible and reliable. The package provides a range of functions designed to help researchers and data analysts efficiently work through OMOP CDM survey data, ensuring accurate mappings of responses and effective management of data discrepancies. This package is a helpful tool for those working with OMOP CDM survey data, offering a path to more profound and accurate analyses by dramatically lowering the burden of data preprocessing.

response_set.py

map_answers(input_data): Maps survey responses to corresponding numeric and text values, and handles survey responses that fall outside the predefined cases.

Parameters:

  • input_data: DataFrame containing survey responses with columns question_concept_id and answer_concept_id.

Returns: The modified DataFrame with added columns answer_numeric and answer_text containing the mapped values.

create_dummies(input_data): Transforms survey data to include dummy variables for questions that allow multiple answers. Each response is converted into a separate row with a unique identifier, defined as '{question_concept_id}_{answer_concept_id}'. The function then removes the original rows to prevent data redundancy.

Parameters:

  • input_data: DataFrame containing survey responses with columns question_concept_id and answer_concept_id.

Returns: A new DataFrame with additional rows for select-all-that-apply questions, properly formatted for further analysis.

scale(input_data, variables, scale_name): Calculates a composite score for each participant based on specified variables. This function supports handling missing values and can calculate the score as either the sum or the mean of the selected variables. The results include diagnostic print statements about the calculation.

Parameters:

  • input_data: DataFrame containing the data to be scored.
  • variables: List of columns to be included in the score calculation.
  • score_name: Name of the column where the score will be stored.
  • na: Boolean indicating whether to include participants with missing data in the calculations (na = True or na = False).
  • method: Specifies the method for score calculation (method = 'sum' or method = 'mean').

Returns: The original DataFrame with an additional column containing the calculated scores for each participant.

pivot_data.py

pivot(input_data, file_name): Pivots a dataset to structure numeric survey responses in a wide format. The function checks if the specified file exists; if not, it prints an error message and returns. It reads the data from the file into a DataFrame, then pivots this DataFrame so that each row represents a respondent and each column represents a question, with cells containing the numeric answers. The column names are prefixed with 'q' to denote question IDs. The resulting pivot table is saved to a CSV file and then uploaded to a cloud storage bucket using Google Cloud's gsutil tool.

Parameters:

  • input_data: A pandas DataFrame containing the columns person_id, question_concept_id, and answer_numeric.
  • file_name: Optional; the name of the file to which the pivot table will be saved. Defaults to 'pivot_n.csv'.

pivot_text(input_data, file_name): Similar to pivot_data_numeric, but pivots text responses instead. The resulting pivot table is saved to a CSV file and then uploaded to a cloud storage bucket using Google Cloud's gsutil tool.

Parameters:

  • input_data: A pandas DataFrame containing the columns person_id, question_concept_id, and answer_text.
  • file_name: Optional; the name of the file to which the pivot table will be saved. Defaults to 'pivot_t.csv'.

recode_missing.py

recode(input_data): Processes input data (either in file format or as a pandas DataFrame) to handle missing values according to a specified list of codes. It replaces a predefined set of numeric codes with pandas NA values to standardize the representation of missing data across the dataset.

Parameters:

  • input_data: This can be either a path to a data file (CSV, TXT, or Excel) or a pandas DataFrame. The function adapts its behavior based on the type of input provided.

Returns: A pandas DataFrame with the missing values recoded as pandas NA.

codebooks.py

codebook(df): Processes input data to generate a structured HTML codebook, which includes a detailed listing of questions and responses formatted neatly. The function first cleans and deduplicates the data, then iteratively builds the formatted data, and finally writes it to an HTML file that is linked for download.

Parameters:

  • input_data: DataFrame to be processed into a codebook.

Returns: The HTML-formatted codebook and an IPython display link to the generated HTML file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omopsurvey-0.0.8.tar.gz (87.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omopsurvey-0.0.8-py3-none-any.whl (92.0 kB view details)

Uploaded Python 3

File details

Details for the file omopsurvey-0.0.8.tar.gz.

File metadata

  • Download URL: omopsurvey-0.0.8.tar.gz
  • Upload date:
  • Size: 87.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.0 Darwin/23.4.0

File hashes

Hashes for omopsurvey-0.0.8.tar.gz
Algorithm Hash digest
SHA256 976d776609a18b1ec2eb09407d3b034e803e7b04fb69302f750df74e14ba06e1
MD5 f0187556a1b1cdff7557c5b5483f8fec
BLAKE2b-256 51e319a35a5f286d9c5b4692317c0aa5fdf778fcab911cab8ed3d5a88321970b

See more details on using hashes here.

File details

Details for the file omopsurvey-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: omopsurvey-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 92.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.0 Darwin/23.4.0

File hashes

Hashes for omopsurvey-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8eee2b5ba4abfc2105408db26ccfb9d56299ccedbe89f3245b6f58bdb13bebff
MD5 e62ace36292c0de05e9afe7b131221c4
BLAKE2b-256 b32e5d70481388e6f1bdf03828fb6d25f805e8b73438d48ea79fefab8b12c5b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page