Skip to main content

UK Biobank data processing library

Project description

https://img.shields.io/pypi/v/ukbparse.svg https://anaconda.org/conda-forge/ukbparse/badges/version.svg https://zenodo.org/badge/DOI/10.5281/zenodo.1997626.svg https://git.fmrib.ox.ac.uk/fsl/ukbparse/badges/master/coverage.svg

ukbparse is a Python library for pre-processing of UK BioBank data.

Installation

Install ukbparse via pip:

pip install ukbparse

Or from conda-forge:

conda install -c conda-forge ukbparse

Comprehensive documentation does not yet exist.

Introductory notebook

The ukbparse_demo command will start a Jupyter Notebook which introduces the main features provided by ukbparse. To run it, you need to install a few additional dependencies:

pip install ukbparse[demo]

You can then start the demo by running ukbparse_demo.

Usage

General usage is as follows:

ukbparse [options] output.tsv input1.tsv input2.tsv

You can get information on all of the options by typing ukbparse --help.

Options can be specified on the command line, and/or stored in a configuration file. For example, the options in the following command line:

ukbparse \
  --overwrite \
  --import_all \
  --log_file log.txt \
  --icd10_map_file icd_codes.tsv \
  --category 10 \
  --category 11 \
  output.tsv input1.tsv input2.tsv

Could be stored in a configuration file config.txt:

overwrite
import_all
log_file       log.txt
icd10_map_file icd_codes.tsv
category       10
category       11

And then executed as follows:

ukbparse -cfg config.txt output.tsv input1.tsv input2.tsv

Customising

ukbparse contains a large number of built-in rules which have been specifically written to pre-process UK BioBank data variables. These rules are stored in the following files:

  • ukbparse/data/variables.tsv: Cleaning rules for individual variables

  • ukbparse/data/datacodings.tsv: Cleaning rules for data codings

  • ukbparse/data/types.tsv: Cleaning rules for specific types

  • ukbparse/data/processing.tsv: Processing steps

You can customise or replace these files as you see fit. You can also pass your own versions of these files to ukbparse via the --variable_file, --datacoding_file, --type_file and --processing_file command-line options respectively.

The variables.tsv file defines all of the variables that ukbparse is aware of. If your UK BioBank data set contains variables which are not listed in this file, you may wish to generate your own version - you can do so by following these steps:

  1. Use the ukbconv utility (available through the BioBank Data showcase) to generate a HTML file describing all of the variables in your data set, and data codings used by them.

  2. Use the ukbparse_htmlparse command to convert this html file into variable and data coding “base” files, which just contain the meta-data for each variable/data coding.

  3. Code up your custom cleaning rules for each variable and data coding, in the same format as can be seen in the ukbparse/data/ directory. For data codings, create these flies:

    • datacodings_navalues.tsv: contains NA value replacement rules

    • datacodings_recoding.tsv: contains categorical recoding rules

    And for variables, create these files:

    • variables_navalues.tsv: Contains NA value replacement rules

    • variables_recoding.tsv: Contains categorical recoding rules

    • variables_clean.tsv: Contains variable-specific cleaning functions

    • variables_parentvalues.tsv: Contains child value replacement rules.

  4. Use the ukbparse_join command to generate the final variable and data coding tables from your base files, e.g.:

    ukbparse_join final_variables_table.tsv \
                  variables_base.tsv \
                  variables_navalues.tsv \
                  variables_recoding.tsv \
                  variables_parentvalues.tsv \
                  variables_clean.tsv
    ukbparse_join final_datacodings.tsv \
                  datacodings_base.tsv \
                  datacodings_navalues.tsv \
                  datacodings_recoding.tsv

Tests

To run the test suite, you need to install some additional dependencies:

pip install ukbparse[test]

Then you can run the test suite using pytest:

pytest

Citing

If you would like to cite ukbparse, please refer to its Zenodo page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ukbparse-0.14.4.tar.gz (626.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ukbparse-0.14.4-py3-none-any.whl (651.3 kB view details)

Uploaded Python 3

File details

Details for the file ukbparse-0.14.4.tar.gz.

File metadata

  • Download URL: ukbparse-0.14.4.tar.gz
  • Upload date:
  • Size: 626.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.8

File hashes

Hashes for ukbparse-0.14.4.tar.gz
Algorithm Hash digest
SHA256 538b439000eb11bbfe046b986f90833a5b16a89473673931bb934e62bd50022f
MD5 46c21b01435426774a4c93114b6514f0
BLAKE2b-256 cf73109b8ca1b42aa5ca4515dcdd8ef106c4acbdc2f4bb183224320181ce8314

See more details on using hashes here.

File details

Details for the file ukbparse-0.14.4-py3-none-any.whl.

File metadata

  • Download URL: ukbparse-0.14.4-py3-none-any.whl
  • Upload date:
  • Size: 651.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.8

File hashes

Hashes for ukbparse-0.14.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5bf2fc302e164ba21323640224886f9496dbecbf9e3dea79f130dd095d82643d
MD5 97694d1b0b78aadd03415a91c1b627f0
BLAKE2b-256 0d1adf3db980acc8c7eba77af401d3a636a91b2b515211049a2740001ab5ad52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page