Skip to main content

An ensemble of functions for analysing UKBB records on DNA Nexus.

Project description

UKBB Health Care Records

This repository contains an ensemble of functions for use analyzing the UKBB records on DNA Nexus.

Available Functions

  • read_GP: Reads data from the GP clinical records. It takes a list of read3 or read2 codes and returns one line per matching record, including eid date, value, and read code.

  • read_OPCS: Reads OPCS codes from the operation records. It takes a list of OPCS codes and returns eid, opdate, and code for each matching record.

  • read_ICD10: Reads data from the HES records using ICD10. It performs an inner join on the diagnosis and returns the eid, code, in date, out date, and whether it was a primary or secondary diagnosis.

  • read_ICD9: Reads data from the HES records using ICD9. It performs an inner join on the diagnosis but there is no data on ICD9 dates of diagnosis in the UKBB HES records.

  • read_selfreport_illness: Reads data from the UK Biobank's non-cancer self-reported illness codes. It takes a list of codes from https://biobank.ctsu.ox.ac.uk/crystal/coding.cgi?id=6 and returns a list of matching IDs.

How to Use

Setup

To set up the environment for running the Python scripts, you need to have Python installed along with the necessary packages. You can install the required packages using pip:

pip install pandas numpy scipy matplotlib seaborn statsmodels polars pyarrow fastparquet
import subprocess
subprocess.run("curl https://raw.githubusercontent.com/Surajram112/UKBB_py/main/UKBB_Health_Records_New_Project.py > UKBB_Health_Records_New_Project.py", shell=True, check=True)
from UKBB_Health_Records_New_Project import *

Loading data into your UkBiobank Project

project_folder = 't1diabetes'
load_save_data(project_folder)

The project foler is where the data will be imported to. project_folder = "name you want to give to the particular project you are going to be working on"

Extracting Healthcare Records

You can use the functions provided to extract healthcare records. For example, to extract ICD10 records, you can run:

ICD10_codes = ['E10', 'E11']
ICD_records = read_ICD10(ICD10_codes, project_folder)

This will return a DataFrame ICD10_records which will contain all HES records that match either E10 (Insulin-dependent diabetes mellitus) or E11 (Non-insulin-dependent diabetes mellitus). This can also be run on sub-codes, e.g. E11.3, for Diabetic Retinopathy.

Combining Healthcare Sources

Many phenotypes can be defined in a variety of ways. For example, Frozen Shoulder can be defined by ICD10 code M75.0, GP codes N210., XE1FL, and XE1Hm or OPCS4 code W78.1.

The function first_occurence can take ICD, GP, OPCS and output the first date the phenotype appears and where it first appears. Running

frozen_shoulder = first_occurence(project_folder, ICD10='M75.0', GP=["N210.", "XE1FL", "XE1Hm"], OPCS='W78.1')

will return a DataFrame with three columns: the id, the date of the first frozen shoulder record, and the source that appeared in. For this phenotype, I don't need to query the cancer registry, so '' is used as the input.

Longitudinal Primary Care Records

read_GP preserves the value from the GP records and can be used for longitudinal analysis. Using the read_3 code 22K.. for BMI, you can run read_GP(['22K..']) and it will return all BMI recordings in the GP records.

These are longitudinal and have the date in event_dt and the actual BMI value in value1, value2, or value3.

Working on

  • read_cancer: Reads data from the Cancer Registry data using ICD10. It returns the eid, date, and cancer type.

  • read_selfreport_cancer: Reads data from the UK Biobank's cancer self-reported illness codes. It takes a list of codes from https://biobank.ctsu.ox.ac.uk/crystal/coding.cgi?id=3 and returns a list of matching IDs.

  • first_occurence: Takes a list of ICD10, read3, OPCS, and cancer ICD10 codes and returns the date and source of the first occurrence of disease. It does not use ICD9, because the dates are not present in these records.

Example Usage

Below is an example usage of the main script:

import subprocess
subprocess.run("curl https://raw.githubusercontent.com/Surajram112/UKBB_py/main/UKBB_Health_Records_New_Project.py > UKBB_Health_Records_New_Project.py", shell=True, check=True)
from UKBB_Health_Records_New_Project import *
project_folder = 'test'
load_save_data(project_folder)

# Define read functions and other functionality here
GP_codes = ['XE2eD', '22K..']
GP_records = read_GP(GP_codes, project_folder)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ukbb_py-0.1.0.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ukbb_py-0.1.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file ukbb_py-0.1.0.tar.gz.

File metadata

  • Download URL: ukbb_py-0.1.0.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for ukbb_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 958ffcb9ee4dc13f3666b158c99a5ba5f7e1445cff216d78bff41a0e51b7e792
MD5 b1af081b255163be3337b6bf377a71ce
BLAKE2b-256 4c88f6b011e63800610bfc02d5461b0c9537673a817b56b85774fb92167ebef9

See more details on using hashes here.

File details

Details for the file ukbb_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ukbb_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for ukbb_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b92f2f51bed295cbc8003646d0f7d8d0ae4bf5a64327559367e492356db0b67
MD5 50ccad957005e6c62aaed635403b8ecc
BLAKE2b-256 8f7076777e9b9941334642b325ea9d7171f221597926f318ef386bccfd3c9a9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page