UK Biobank data processing library
Project description
``ukbparse`` - the UK BioBank data parser
=========================================
.. image:: https://img.shields.io/pypi/v/ukbparse.svg
:target: https://pypi.python.org/pypi/ukbparse/
.. image:: https://anaconda.org/conda-forge/ukbparse/badges/version.svg
:target: https://anaconda.org/conda-forge/ukbparse
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1997626.svg
:target: https://doi.org/10.5281/zenodo.1997626
.. image:: https://git.fmrib.ox.ac.uk/fsl/ukbparse/badges/master/coverage.svg
:target: https://git.fmrib.ox.ac.uk/fsl/ukbparse/commits/master/
``ukbparse`` is a Python library for pre-processing of UK BioBank data.
``ukbparse`` is developed at the Wellcome Centre for Integrative
Neuroimaging (WIN@FMRIB), University of Oxford. ``ukbparse`` is in no way
endorsed, sanctioned, or validated by the :ref:`UK BioBank
<https://www.ukbiobank.ac.uk/>`_.
``ukbparse`` comes bundled with metadata about the variables present in UK
BioBank data sets. This metadata can be obtained from the :ref:`UK BioBank
online data showcase <https://biobank.ctsu.ox.ac.uk/showcase/index.cgi>`_
Installation
------------
Install ``ukbparse`` via pip::
pip install ukbparse
Or from ``conda-forge``::
conda install -c conda-forge ukbparse
Comprehensive documentation does not yet exist.
Introductory notebook
---------------------
The ``ukbparse_demo`` command will start a Jupyter Notebook which introduces
the main features provided by ``ukbparse``. To run it, you need to install a
few additional dependencies::
pip install ukbparse[demo]
You can then start the demo by running ``ukbparse_demo``.
.. note:: The introductory notebook uses ``bash``, so is unlikely to work on
Windows.
Usage
-----
General usage is as follows::
ukbparse [options] output.tsv input1.tsv input2.tsv
You can get information on all of the options by typing ``ukbparse --help``.
Options can be specified on the command line, and/or stored in a configuration
file. For example, the options in the following command line::
ukbparse \
--overwrite \
--import_all \
--log_file log.txt \
--icd10_map_file icd_codes.tsv \
--category 10 \
--category 11 \
output.tsv input1.tsv input2.tsv
Could be stored in a configuration file ``config.txt``::
overwrite
import_all
log_file log.txt
icd10_map_file icd_codes.tsv
category 10
category 11
And then executed as follows::
ukbparse -cfg config.txt output.tsv input1.tsv input2.tsv
Customising
-----------
``ukbparse`` contains a large number of built-in rules which have been
specifically written to pre-process UK BioBank data variables. These rules are
stored in the following files:
* ``ukbparse/data/variables.tsv``: Cleaning rules for individual variables
* ``ukbparse/data/datacodings.tsv``: Cleaning rules for data codings
* ``ukbparse/data/types.tsv``: Cleaning rules for specific types
* ``ukbparse/data/processing.tsv``: Processing steps
You can customise or replace these files as you see fit. You can also pass
your own versions of these files to ``ukbparse`` via the ``--variable_file``,
``--datacoding_file``, ``--type_file`` and ``--processing_file`` command-line
options respectively.
The ``variables.tsv`` file defines all of the variables that ``ukbparse`` is
aware of. If your UK BioBank data set contains variables which are not listed
in this file, you may wish to generate your own version - you can do so
by following these steps:
1. Use the ``ukbconv`` utility (available through the `BioBank Data showcase
<http://biobank.ctsu.ox.ac.uk/showcase/>`_) to generate a HTML file
describing all of the variables in your data set, and data codings used by
them.
2. Use the ``ukbparse_htmlparse`` command to convert this ``html`` file into
variable and data coding "base" files, which just contain the meta-data
for each variable/data coding.
3. Code up your custom cleaning rules for each variable and data coding, in
the same format as can be seen in the ``ukbparse/data/`` directory. For
data codings, create these flies:
* ``datacodings_navalues.tsv``: contains NA value replacement rules
* ``datacodings_recoding.tsv``: contains categorical recoding rules
And for variables, create these files:
* ``variables_navalues.tsv``: Contains NA value replacement rules
* ``variables_recoding.tsv``: Contains categorical recoding rules
* ``variables_clean.tsv``: Contains variable-specific cleaning functions
* ``variables_parentvalues.tsv``: Contains child value replacement rules.
4. Use the ``ukbparse_join`` command to generate the final variable and data
coding tables from your base files, e.g.::
ukbparse_join final_variables_table.tsv \
variables_base.tsv \
variables_navalues.tsv \
variables_recoding.tsv \
variables_parentvalues.tsv \
variables_clean.tsv
ukbparse_join final_datacodings.tsv \
datacodings_base.tsv \
datacodings_navalues.tsv \
datacodings_recoding.tsv
Tests
-----
To run the test suite, you need to install some additional dependencies::
pip install ukbparse[test]
Then you can run the test suite using ``pytest``::
pytest
Citing
------
If you would like to cite ``ukbparse``, please refer to its `Zenodo page
<https://zenodo.org/record/2203808#.XBDJ-xP7RE4>`_.
=========================================
.. image:: https://img.shields.io/pypi/v/ukbparse.svg
:target: https://pypi.python.org/pypi/ukbparse/
.. image:: https://anaconda.org/conda-forge/ukbparse/badges/version.svg
:target: https://anaconda.org/conda-forge/ukbparse
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1997626.svg
:target: https://doi.org/10.5281/zenodo.1997626
.. image:: https://git.fmrib.ox.ac.uk/fsl/ukbparse/badges/master/coverage.svg
:target: https://git.fmrib.ox.ac.uk/fsl/ukbparse/commits/master/
``ukbparse`` is a Python library for pre-processing of UK BioBank data.
``ukbparse`` is developed at the Wellcome Centre for Integrative
Neuroimaging (WIN@FMRIB), University of Oxford. ``ukbparse`` is in no way
endorsed, sanctioned, or validated by the :ref:`UK BioBank
<https://www.ukbiobank.ac.uk/>`_.
``ukbparse`` comes bundled with metadata about the variables present in UK
BioBank data sets. This metadata can be obtained from the :ref:`UK BioBank
online data showcase <https://biobank.ctsu.ox.ac.uk/showcase/index.cgi>`_
Installation
------------
Install ``ukbparse`` via pip::
pip install ukbparse
Or from ``conda-forge``::
conda install -c conda-forge ukbparse
Comprehensive documentation does not yet exist.
Introductory notebook
---------------------
The ``ukbparse_demo`` command will start a Jupyter Notebook which introduces
the main features provided by ``ukbparse``. To run it, you need to install a
few additional dependencies::
pip install ukbparse[demo]
You can then start the demo by running ``ukbparse_demo``.
.. note:: The introductory notebook uses ``bash``, so is unlikely to work on
Windows.
Usage
-----
General usage is as follows::
ukbparse [options] output.tsv input1.tsv input2.tsv
You can get information on all of the options by typing ``ukbparse --help``.
Options can be specified on the command line, and/or stored in a configuration
file. For example, the options in the following command line::
ukbparse \
--overwrite \
--import_all \
--log_file log.txt \
--icd10_map_file icd_codes.tsv \
--category 10 \
--category 11 \
output.tsv input1.tsv input2.tsv
Could be stored in a configuration file ``config.txt``::
overwrite
import_all
log_file log.txt
icd10_map_file icd_codes.tsv
category 10
category 11
And then executed as follows::
ukbparse -cfg config.txt output.tsv input1.tsv input2.tsv
Customising
-----------
``ukbparse`` contains a large number of built-in rules which have been
specifically written to pre-process UK BioBank data variables. These rules are
stored in the following files:
* ``ukbparse/data/variables.tsv``: Cleaning rules for individual variables
* ``ukbparse/data/datacodings.tsv``: Cleaning rules for data codings
* ``ukbparse/data/types.tsv``: Cleaning rules for specific types
* ``ukbparse/data/processing.tsv``: Processing steps
You can customise or replace these files as you see fit. You can also pass
your own versions of these files to ``ukbparse`` via the ``--variable_file``,
``--datacoding_file``, ``--type_file`` and ``--processing_file`` command-line
options respectively.
The ``variables.tsv`` file defines all of the variables that ``ukbparse`` is
aware of. If your UK BioBank data set contains variables which are not listed
in this file, you may wish to generate your own version - you can do so
by following these steps:
1. Use the ``ukbconv`` utility (available through the `BioBank Data showcase
<http://biobank.ctsu.ox.ac.uk/showcase/>`_) to generate a HTML file
describing all of the variables in your data set, and data codings used by
them.
2. Use the ``ukbparse_htmlparse`` command to convert this ``html`` file into
variable and data coding "base" files, which just contain the meta-data
for each variable/data coding.
3. Code up your custom cleaning rules for each variable and data coding, in
the same format as can be seen in the ``ukbparse/data/`` directory. For
data codings, create these flies:
* ``datacodings_navalues.tsv``: contains NA value replacement rules
* ``datacodings_recoding.tsv``: contains categorical recoding rules
And for variables, create these files:
* ``variables_navalues.tsv``: Contains NA value replacement rules
* ``variables_recoding.tsv``: Contains categorical recoding rules
* ``variables_clean.tsv``: Contains variable-specific cleaning functions
* ``variables_parentvalues.tsv``: Contains child value replacement rules.
4. Use the ``ukbparse_join`` command to generate the final variable and data
coding tables from your base files, e.g.::
ukbparse_join final_variables_table.tsv \
variables_base.tsv \
variables_navalues.tsv \
variables_recoding.tsv \
variables_parentvalues.tsv \
variables_clean.tsv
ukbparse_join final_datacodings.tsv \
datacodings_base.tsv \
datacodings_navalues.tsv \
datacodings_recoding.tsv
Tests
-----
To run the test suite, you need to install some additional dependencies::
pip install ukbparse[test]
Then you can run the test suite using ``pytest``::
pytest
Citing
------
If you would like to cite ``ukbparse``, please refer to its `Zenodo page
<https://zenodo.org/record/2203808#.XBDJ-xP7RE4>`_.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ukbparse-0.15.1.tar.gz
(1.6 MB
view hashes)
Built Distribution
Close
Hashes for ukbparse-0.15.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6761b0c51e605f92e24efec3ae4a5a5b203731fe92ee364fe40d4eb588743a3b |
|
MD5 | daa2f602c140844644cd22093ed350e4 |
|
BLAKE2b-256 | db344cd951a03f181b431d59dc909888bf8547ad0cb71107725f552689e7e33e |