ukbparse

UK Biobank data processing library

Project description

``ukbparse`` - the UK BioBank data parser
=========================================

.. image:: https://img.shields.io/pypi/v/ukbparse.svg
:target: https://pypi.python.org/pypi/ukbparse/

.. image:: https://anaconda.org/conda-forge/ukbparse/badges/version.svg
:target: https://anaconda.org/conda-forge/ukbparse

.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1997626.svg
:target: https://doi.org/10.5281/zenodo.1997626

.. image:: https://git.fmrib.ox.ac.uk/fsl/ukbparse/badges/master/coverage.svg
:target: https://git.fmrib.ox.ac.uk/fsl/ukbparse/commits/master/

``ukbparse`` is a Python library for pre-processing of UK BioBank data.

``ukbparse`` is developed at the Wellcome Centre for Integrative
Neuroimaging (WIN@FMRIB), University of Oxford. ``ukbparse`` is in no way
endorsed, sanctioned, or validated by the :ref:`UK BioBank
<https://www.ukbiobank.ac.uk/>`_.

``ukbparse`` comes bundled with metadata about the variables present in UK
BioBank data sets. This metadata can be obtained from the :ref:`UK BioBank
online data showcase <https://biobank.ctsu.ox.ac.uk/showcase/index.cgi>`_

Installation
------------

Install ``ukbparse`` via pip::

pip install ukbparse

Or from ``conda-forge``::

conda install -c conda-forge ukbparse

Comprehensive documentation does not yet exist.

Introductory notebook
---------------------

The ``ukbparse_demo`` command will start a Jupyter Notebook which introduces
the main features provided by ``ukbparse``. To run it, you need to install a
few additional dependencies::

pip install ukbparse[demo]

You can then start the demo by running ``ukbparse_demo``.

.. note:: The introductory notebook uses ``bash``, so is unlikely to work on
Windows.

Usage
-----

General usage is as follows::

ukbparse [options] output.tsv input1.tsv input2.tsv

You can get information on all of the options by typing ``ukbparse --help``.

Options can be specified on the command line, and/or stored in a configuration
file. For example, the options in the following command line::

ukbparse \
--overwrite \
--import_all \
--log_file log.txt \
--icd10_map_file icd_codes.tsv \
--category 10 \
--category 11 \
output.tsv input1.tsv input2.tsv

Could be stored in a configuration file ``config.txt``::

overwrite
import_all
log_file log.txt
icd10_map_file icd_codes.tsv
category 10
category 11

And then executed as follows::

ukbparse -cfg config.txt output.tsv input1.tsv input2.tsv

Customising
-----------

``ukbparse`` contains a large number of built-in rules which have been
specifically written to pre-process UK BioBank data variables. These rules are
stored in the following files:

* ``ukbparse/data/variables.tsv``: Cleaning rules for individual variables
* ``ukbparse/data/datacodings.tsv``: Cleaning rules for data codings
* ``ukbparse/data/types.tsv``: Cleaning rules for specific types
* ``ukbparse/data/processing.tsv``: Processing steps

You can customise or replace these files as you see fit. You can also pass
your own versions of these files to ``ukbparse`` via the ``--variable_file``,
``--datacoding_file``, ``--type_file`` and ``--processing_file`` command-line
options respectively.

The ``variables.tsv`` file defines all of the variables that ``ukbparse`` is
aware of. If your UK BioBank data set contains variables which are not listed
in this file, you may wish to generate your own version - you can do so
by following these steps:

1. Use the ``ukbconv`` utility (available through the `BioBank Data showcase
<http://biobank.ctsu.ox.ac.uk/showcase/>`_) to generate a HTML file
describing all of the variables in your data set, and data codings used by
them.

2. Use the ``ukbparse_htmlparse`` command to convert this ``html`` file into
variable and data coding "base" files, which just contain the meta-data
for each variable/data coding.

3. Code up your custom cleaning rules for each variable and data coding, in
the same format as can be seen in the ``ukbparse/data/`` directory. For
data codings, create these flies:

* ``datacodings_navalues.tsv``: contains NA value replacement rules
* ``datacodings_recoding.tsv``: contains categorical recoding rules

And for variables, create these files:

* ``variables_navalues.tsv``: Contains NA value replacement rules
* ``variables_recoding.tsv``: Contains categorical recoding rules
* ``variables_clean.tsv``: Contains variable-specific cleaning functions
* ``variables_parentvalues.tsv``: Contains child value replacement rules.

4. Use the ``ukbparse_join`` command to generate the final variable and data
coding tables from your base files, e.g.::

ukbparse_join final_variables_table.tsv \
variables_base.tsv \
variables_navalues.tsv \
variables_recoding.tsv \
variables_parentvalues.tsv \
variables_clean.tsv
ukbparse_join final_datacodings.tsv \
datacodings_base.tsv \
datacodings_navalues.tsv \
datacodings_recoding.tsv

Tests
-----

To run the test suite, you need to install some additional dependencies::

pip install ukbparse[test]

Then you can run the test suite using ``pytest``::

pytest

Citing
------

If you would like to cite ``ukbparse``, please refer to its `Zenodo page
<https://zenodo.org/record/2203808#.XBDJ-xP7RE4>`_.

Project details

Release history Release notifications | RSS feed

0.22.0

May 10, 2019

0.21.1

May 9, 2019

0.20.0

May 7, 2019

0.19.2

Apr 26, 2019

0.19.1

Apr 25, 2019

0.19.0

Apr 24, 2019

0.18.0

Apr 23, 2019

0.17.0

Apr 22, 2019

0.16.0

Mar 22, 2019

This version

0.15.1

Mar 21, 2019

0.15.0

Mar 19, 2019

0.14.6

Mar 16, 2019

0.14.5

Jan 17, 2019

0.14.4

Jan 11, 2019

0.14.3

Jan 8, 2019

0.14.2

Jan 7, 2019

0.14.1

Jan 7, 2019

0.14.0

Dec 25, 2018

0.13.0

Dec 20, 2018

0.12.1

Dec 15, 2018

0.12.0

Dec 11, 2018

0.10.5

Dec 8, 2018

0.10.4

Dec 7, 2018

0.10.3

Dec 7, 2018

0.10.2

Dec 7, 2018

0.10.1

Dec 7, 2018

0.10.0

Dec 7, 2018

0.9.0

Dec 6, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ukbparse-0.15.1.tar.gz (1.6 MB view hashes)

Uploaded Mar 21, 2019 Source

Built Distribution

ukbparse-0.15.1-py3-none-any.whl (1.7 MB view hashes)

Uploaded Mar 21, 2019 Python 3

Hashes for ukbparse-0.15.1.tar.gz

Hashes for ukbparse-0.15.1.tar.gz
Algorithm	Hash digest
SHA256	`46b16bb09de099f092638176fe4fe90cca8cfa90c0891ee5578056aee662ed45`
MD5	`31dee804d090db71cfc612685c95d62b`
BLAKE2b-256	`97b58fdfac1e98ceb2e4d5ae3de409c7e0d68f7cba6c39ef65684bc3843e8963`

Hashes for ukbparse-0.15.1-py3-none-any.whl

Hashes for ukbparse-0.15.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6761b0c51e605f92e24efec3ae4a5a5b203731fe92ee364fe40d4eb588743a3b`
MD5	`daa2f602c140844644cd22093ed350e4`
BLAKE2b-256	`db344cd951a03f181b431d59dc909888bf8547ad0cb71107725f552689e7e33e`