UK Biobank data processing library
Project description
``ukbparse`` - the UK BioBank data parser
=========================================
.. image:: https://img.shields.io/pypi/v/ukbparse.svg
:target: https://pypi.python.org/pypi/ukbparse/
.. image:: https://anaconda.org/conda-forge/ukbparse/badges/version.svg
:target: https://anaconda.org/conda-forge/ukbparse
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1997626.svg
:target: https://doi.org/10.5281/zenodo.1997626
.. image:: https://git.fmrib.ox.ac.uk/fsl/ukbparse/badges/master/coverage.svg
:target: https://git.fmrib.ox.ac.uk/fsl/ukbparse/commits/master/
``ukbparse`` is a Python library for pre-processing of UK BioBank data.
``ukbparse`` is developed at the Wellcome Centre for Integrative
Neuroimaging (WIN@FMRIB), University of Oxford. ``ukbparse`` is in no way
endorsed, sanctioned, or validated by the :ref:`UK BioBank
<https://www.ukbiobank.ac.uk/>`_.
``ukbparse`` comes bundled with metadata about the variables present in UK
BioBank data sets. This metadata can be obtained from the :ref:`UK BioBank
online data showcase <https://biobank.ctsu.ox.ac.uk/showcase/index.cgi>`_
Installation
------------
Install ``ukbparse`` via pip::
pip install ukbparse
Or from ``conda-forge``::
conda install -c conda-forge ukbparse
Comprehensive documentation does not yet exist.
Introductory notebook
---------------------
The ``ukbparse_demo`` command will start a Jupyter Notebook which introduces
the main features provided by ``ukbparse``. To run it, you need to install a
few additional dependencies::
pip install ukbparse[demo]
You can then start the demo by running ``ukbparse_demo``.
.. note:: The introductory notebook uses ``bash``, so is unlikely to work on
Windows.
Usage
-----
General usage is as follows::
ukbparse [options] output.tsv input1.tsv input2.tsv
You can get information on all of the options by typing ``ukbparse --help``.
Options can be specified on the command line, and/or stored in a configuration
file. For example, the options in the following command line::
ukbparse \
--overwrite \
--import_all \
--log_file log.txt \
--icd10_map_file icd_codes.tsv \
--category 10 \
--category 11 \
output.tsv input1.tsv input2.tsv
Could be stored in a configuration file ``config.txt``::
overwrite
import_all
log_file log.txt
icd10_map_file icd_codes.tsv
category 10
category 11
And then executed as follows::
ukbparse -cfg config.txt output.tsv input1.tsv input2.tsv
Customising
-----------
``ukbparse`` contains a large number of built-in rules which have been
specifically written to pre-process UK BioBank data variables. These rules are
stored in the following files:
* ``ukbparse/data/variables_*.tsv``: Cleaning rules for individual variables
* ``ukbparse/data/datacodings_*.tsv``: Cleaning rules for data codings
* ``ukbparse/data/types.tsv``: Cleaning rules for specific types
* ``ukbparse/data/processing.tsv``: Processing steps
You can customise or replace these files as you see fit. You can also pass
your own versions of these files to ``ukbparse`` via the ``--variable_file``,
``--datacoding_file``, ``--type_file`` and ``--processing_file`` command-line
options respectively.``ukbparse`` will load all variable and datacoding files,
and merge them into a single table which contains the cleaning rules for each
variable.
Finally, you can use the ``--no_builtins`` option to bypass all of the
built-in cleaning and processing rules.
Output
------
The main output of ``ukbparse`` is a plain-text tab-delimited[*]_ file which
contains the input data, after cleaning and processing, potentially with
some columns removed, and new columns added.
If you used the ``--non_numeric_file`` option, the main output file will only
contain the numeric columns; non-numeric columns will be saved to a separate
file.
You can use any tool of your choice to load this output file, such as Python,
MATLAB, or Excel. It is also possible to pass the output back into
``ukbparse``.
.. [*] You can change the delimiter via the ``--tsv_sep`` / ``-ts`` option.
Loading output into MATLAB
^^^^^^^^^^^^^^^^^^^^^^^^^^
.. |readtable| replace:: ``readtable``
.. _readtable: https://uk.mathworks.com/help/matlab/ref/readtable.html
.. |table| replace:: ``table``
.. _table: https://uk.mathworks.com/help/matlab/ref/table.html
If you are using MATLAB, you have several options for loading the ``ukbparse``
output. The best option is |readtable|_, which will load column names, and
will handle both non-numeric data and missing values. Use ``readtable`` like
so::
data = readtable('out.tsv', 'FileType', 'text');
The ``readtable`` function returns a |table|_ object, which stores each column
as a separate vector (or cell-array for non-numeric columns). If you are only
interested in numeric columns, you can retrieve them as an array like this::
data = data(:, vartype('numeric')).Variables;
Tests
-----
To run the test suite, you need to install some additional dependencies::
pip install ukbparse[test]
Then you can run the test suite using ``pytest``::
pytest
Citing
------
If you would like to cite ``ukbparse``, please refer to its `Zenodo page
<https://doi.org/10.5281/zenodo.1997626>`_.
=========================================
.. image:: https://img.shields.io/pypi/v/ukbparse.svg
:target: https://pypi.python.org/pypi/ukbparse/
.. image:: https://anaconda.org/conda-forge/ukbparse/badges/version.svg
:target: https://anaconda.org/conda-forge/ukbparse
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.1997626.svg
:target: https://doi.org/10.5281/zenodo.1997626
.. image:: https://git.fmrib.ox.ac.uk/fsl/ukbparse/badges/master/coverage.svg
:target: https://git.fmrib.ox.ac.uk/fsl/ukbparse/commits/master/
``ukbparse`` is a Python library for pre-processing of UK BioBank data.
``ukbparse`` is developed at the Wellcome Centre for Integrative
Neuroimaging (WIN@FMRIB), University of Oxford. ``ukbparse`` is in no way
endorsed, sanctioned, or validated by the :ref:`UK BioBank
<https://www.ukbiobank.ac.uk/>`_.
``ukbparse`` comes bundled with metadata about the variables present in UK
BioBank data sets. This metadata can be obtained from the :ref:`UK BioBank
online data showcase <https://biobank.ctsu.ox.ac.uk/showcase/index.cgi>`_
Installation
------------
Install ``ukbparse`` via pip::
pip install ukbparse
Or from ``conda-forge``::
conda install -c conda-forge ukbparse
Comprehensive documentation does not yet exist.
Introductory notebook
---------------------
The ``ukbparse_demo`` command will start a Jupyter Notebook which introduces
the main features provided by ``ukbparse``. To run it, you need to install a
few additional dependencies::
pip install ukbparse[demo]
You can then start the demo by running ``ukbparse_demo``.
.. note:: The introductory notebook uses ``bash``, so is unlikely to work on
Windows.
Usage
-----
General usage is as follows::
ukbparse [options] output.tsv input1.tsv input2.tsv
You can get information on all of the options by typing ``ukbparse --help``.
Options can be specified on the command line, and/or stored in a configuration
file. For example, the options in the following command line::
ukbparse \
--overwrite \
--import_all \
--log_file log.txt \
--icd10_map_file icd_codes.tsv \
--category 10 \
--category 11 \
output.tsv input1.tsv input2.tsv
Could be stored in a configuration file ``config.txt``::
overwrite
import_all
log_file log.txt
icd10_map_file icd_codes.tsv
category 10
category 11
And then executed as follows::
ukbparse -cfg config.txt output.tsv input1.tsv input2.tsv
Customising
-----------
``ukbparse`` contains a large number of built-in rules which have been
specifically written to pre-process UK BioBank data variables. These rules are
stored in the following files:
* ``ukbparse/data/variables_*.tsv``: Cleaning rules for individual variables
* ``ukbparse/data/datacodings_*.tsv``: Cleaning rules for data codings
* ``ukbparse/data/types.tsv``: Cleaning rules for specific types
* ``ukbparse/data/processing.tsv``: Processing steps
You can customise or replace these files as you see fit. You can also pass
your own versions of these files to ``ukbparse`` via the ``--variable_file``,
``--datacoding_file``, ``--type_file`` and ``--processing_file`` command-line
options respectively.``ukbparse`` will load all variable and datacoding files,
and merge them into a single table which contains the cleaning rules for each
variable.
Finally, you can use the ``--no_builtins`` option to bypass all of the
built-in cleaning and processing rules.
Output
------
The main output of ``ukbparse`` is a plain-text tab-delimited[*]_ file which
contains the input data, after cleaning and processing, potentially with
some columns removed, and new columns added.
If you used the ``--non_numeric_file`` option, the main output file will only
contain the numeric columns; non-numeric columns will be saved to a separate
file.
You can use any tool of your choice to load this output file, such as Python,
MATLAB, or Excel. It is also possible to pass the output back into
``ukbparse``.
.. [*] You can change the delimiter via the ``--tsv_sep`` / ``-ts`` option.
Loading output into MATLAB
^^^^^^^^^^^^^^^^^^^^^^^^^^
.. |readtable| replace:: ``readtable``
.. _readtable: https://uk.mathworks.com/help/matlab/ref/readtable.html
.. |table| replace:: ``table``
.. _table: https://uk.mathworks.com/help/matlab/ref/table.html
If you are using MATLAB, you have several options for loading the ``ukbparse``
output. The best option is |readtable|_, which will load column names, and
will handle both non-numeric data and missing values. Use ``readtable`` like
so::
data = readtable('out.tsv', 'FileType', 'text');
The ``readtable`` function returns a |table|_ object, which stores each column
as a separate vector (or cell-array for non-numeric columns). If you are only
interested in numeric columns, you can retrieve them as an array like this::
data = data(:, vartype('numeric')).Variables;
Tests
-----
To run the test suite, you need to install some additional dependencies::
pip install ukbparse[test]
Then you can run the test suite using ``pytest``::
pytest
Citing
------
If you would like to cite ``ukbparse``, please refer to its `Zenodo page
<https://doi.org/10.5281/zenodo.1997626>`_.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ukbparse-0.19.1.tar.gz
(1.5 MB
view hashes)
Built Distribution
Close
Hashes for ukbparse-0.19.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c4fddec587ea3c778cb292fc4b400254355c372c4d3c4feb04e4d836055a735 |
|
MD5 | 99e64183e2dbd85ce8c41e11b3f76b73 |
|
BLAKE2b-256 | 364f2862cc14309e542f54dc03d6f59ae984bee7b67c7df2bdb8ed88137314c7 |