Wearable health data to NumPy
Project description
mHealthData
Wearable health data to NumPy
Features
- Read health data export of
Fitbit
,Samsung Health
, andApple Healthkit
- Read all
.xml
,.csv
,.json
to pandas Dataframe - Fix time zone inconsistency and convert to local time
- Device-centric approach - output numpy arrays for selected wearable devices
- Resample to per-minute or per-day numpy arrays:
- (N days x 1440 minutes) for
steps
,sleep
, andbpm
- (N days) for
weight
,rhr
, andhrv
- (N days x 1440 minutes) for
Installation
pip install mhealthdata
Quick start
Assume we have Fitbit
data export .zip
downloaded to folder /Users/username/Downloads/wearable_data/
and unzipped into a subfolder /Users/username/Downloads/wearable_data/User/
with lots of .xml
, .csv
, .json
and sub-folders inside.
Load data:
import mhealthdata
path = '/Users/username/Downloads/wearable_data/User/'
wdata = mhealthdata.FitbitLoader(path)
- Use
SHealthLoader()
for loading Samsung Health export - Use
HealthkitLoader()
for loading Apple Health export
Show loaded dataframes aswell as steps
records dataframe:
print(wdata.dataframes)
print(wdata.df['steps'].head())
Data Analysis and Visualization
Convert data to numpy arrays:
data = wdata.get_device_data()
- By default
mhealthdata
truncatessteps/min
,heart rate/
, andweight
in [kg] to physically meaningful range0 - 255
. - See valid device list:
wdata.devices
- Get numpy arrays for specific device e.g.
data = wdata.get_device_data('iPhone')
Date range:
from datetime import datetime
idates = data['idate'] # ordinal days (January 1st of year 1 - is day 1)
dates = [datetime.fromordinal(d) for d in idates]
print(f'Date range {dates[0]} - {dates[-1]}')
Plot one day of data:
import pylab as plt
i = 9 # Let us plot day 10 (numbering starts with zero)
plt.figure(facecolor='white')
plt.title(f'Date {dates[i]}')
for dname in ['steps', 'sleep', 'bpm']:
plt.plot(data[dname][i], label=dname)
plt.legend()
plt.show()
- Zero values indicate missing data (also not walking and not sleeping for
steps
andsleep
) - By default
mhealthdata
pads data with zeros to match full weeks (Monday through Sunday), so some days at the beginning and at the end may be empty
Data correlations:
from scipy.stats import pearsonr
x = data['rhr']
y = data['weight']
# IMPORTANT: zero values indicate missing data and should be disregarded
mask = (x > 0) & (y > 0)
r, p = pearsonr(x[mask], y[mask])
print(f'Correlation {r:.2f}, P-value {p:.2g}')
- Missing data are a certaing problem in wearable data analysis
- A study Pyrkov T.V. et al., Nat Comms 12, 2765 (2021) shows high consistency of recovery rates in quite different biological signals - physical activity measured by consumer wearable devices and laboratory blood cell counts. The typical recovery time of 1-2 weeks. The finding suggests it may be safe to use averaging windows or impute data gaps of several day length (though both affect noise and correlation and therefore should be used with caution).
Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mhealthdata-0.1.7.tar.gz
(22.0 kB
view details)
File details
Details for the file mhealthdata-0.1.7.tar.gz
.
File metadata
- Download URL: mhealthdata-0.1.7.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92a9110691f8206771b8fbd2a021682cfdc48f007f19955a0dd5f250f5f7b9f4 |
|
MD5 | a27f25fb01dfe51908c4084a5ccffab0 |
|
BLAKE2b-256 | e5ca028c8e8a51f10b4191dc919a1e669716f7ad4da18ae94ad66b0e6c2e77fd |