Skip to main content

This package provides access to the e-Science Central data warehouse that can be used to store, access and analyse data collected in scientific studies, including for healthcare applications

Project description

Data Warehouse Client

This package provides access to the e-Science Central data warehouse that can be used to store, access and analyse data collected in scientific studies, including for healthcare applications. The primary aim of the warehouse was to create a general system that enables users to explore data collected in a variety of forms. This might include data collected through questionnaires, data collected from sensors, and features extracted from the analysis of sensor data (e.g. activity levels derived from raw accelerometer data). Researchers might wish to slice, dice, visualise, analyse and explore this data in different ways, e.g. all results for one participant, all results for one type of measure in a study, changes in measurements over time. Others may wish to build models that can then be used in applications that make predictions about future values.

Traditionally, data collected in studies has been stored in a collection of files, often with metadata encoded in the filenames. This makes it difficult, and time consuming, for researchers to explore, interpret and analyse the data. The data warehouse exploits modern database technology to vastly simplify this effort. In doing this we have drawn heavily on the best practice for data warehouse design. However, there is more variety in the types of healthcare data to be stored than there is in a typical warehouse, and so we have been forced to deviate from a conventional data warehouse in some aspect of the design.
There are three guiding principles behind the design:

  1. The data warehouse must be able to store any type of data collected in a study without modifying the schema. This means that when new types of data are collected in studies (e.g. from a new questionnaire, a new data analysis program, or a new sensor) they can be stored in the warehouse without any changes to its design. This has 3 main advantages: firstly, it enables us to fix and optimise the schema for the tables in which the data is stored; secondly it means that applications and tools (e.g. for analysis and visualisation) built on the warehouse do not have to be updated when new types of data are added; thirdly, a single, multi-tenant database server can support many studies. This reduces the overall costs, the start-up time for a new study, and the overheads of managing the warehouse.
  2. Descriptive information about the types of measurement is stored in the warehouse so that tools or humans can interpret the data stored there.
  3. The design is optimised for query performance. In several cases, this has led to denormalization (duplication of data) to reduce the need for expensive joins.
  4. It must support a security regime to restrict each user’s access to the data collected in studies.

For more information see: P. Watson and H. Hiden, "The e-Science Central Study Data Platform" 2022 IEEE 18th International Conference on e-Science (e-Science), Salt Lake City, UT, USA, 2022, pp. 55-64, doi: 10.1109/eScience55777.2022.00020. https://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=KQJg3lwAAAAJ&sortby=pubdate&citation_for_view=KQJg3lwAAAAJ:z0_F5_TITjQC

For more documentation see A Data Warehouse for Storing and Analysing Study Data.

Running Instructions

To install from PyPi, run:

pip install data-warehouse-client

In directory in which your executable is run, create a db-credentials.json file containing database credentials (substituting all <VARS>):

{"user": "<USER>", "pass": "<PASSWORD>", "IP": "<IP>", "port": <PORT>}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_warehouse_client-3.0.7.tar.gz (807.3 kB view details)

Uploaded Source

Built Distribution

data_warehouse_client-3.0.7-py3-none-any.whl (80.1 kB view details)

Uploaded Python 3

File details

Details for the file data_warehouse_client-3.0.7.tar.gz.

File metadata

  • Download URL: data_warehouse_client-3.0.7.tar.gz
  • Upload date:
  • Size: 807.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for data_warehouse_client-3.0.7.tar.gz
Algorithm Hash digest
SHA256 92e29c762d4c9410fa20a6890011ce11f4ded6ff3ddc74a882f016ee939a0096
MD5 32e8d1bf611cc6dc0b0b524c901522ce
BLAKE2b-256 3994576c13bfdb09def9c4bbdef0a1f2c1704ba888ff04b48e8d94a88040324d

See more details on using hashes here.

File details

Details for the file data_warehouse_client-3.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for data_warehouse_client-3.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b3af3f30e15ba6ff96f167db78cd013ef5841d05978e0516da33debd84a40591
MD5 598eb38040aa7fb3261261cb380bf247
BLAKE2b-256 3d35be04f1943697afd25aefdbc62eaaa7a0beea904d28a3b88fb68162317a5b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page