Skip to main content

This library provides functions to analyzes food logging data.

Project description

TREETS

#hide
from treets import *

Install

pip install treets

Example for a quick data analysis on phased studies.

import treets.core as treets
import pandas as pd

Take a brief look on the food logging dataset and the reference information sheet

treets.file_loader('data/col_test_data/yrt*').head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Unnamed: 0 original_logtime desc_text food_type PID
0 0 2021-05-12 02:30:00 +0000 Milk b yrt1999
1 1 2021-05-12 02:45:00 +0000 Some Medication m yrt1999
pd.read_excel('data/col_test_data/toy_data_17May2021.xlsx').head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
mCC_ID Participant_Study_ID Study Phase Intervention group (TRE or HABIT) Start_Day End_day Eating_Window_Start Eating_Window_End
0 yrt1999 2 S-REM TRE 2021-05-12 2021-05-14 00:00:00 23:59:00
1 yrt1999 2 T3-INT TRE 2021-05-15 2021-05-18 08:00:00 18:00:00

Call summarize_data_with_experiment_phases() function to make the table that contains analytic information that we want.

df = treets.summarize_data_with_experiment_phases(treets.file_loader('data/col_test_data/yrt*')\
                      , pd.read_excel('data/col_test_data/toy_data_17May2021.xlsx'))
Participant yrt1999 didn't log any food items in the following day(s):
2021-05-18
Participant yrt2000 didn't log any food items in the following day(s):
2021-05-12
2021-05-13
2021-05-14
2021-05-15
2021-05-16
2021-05-17
2021-05-18
Participant yrt1999 have bad logging day(s) in the following day(s):
2021-05-12
2021-05-15
Participant yrt1999 have bad window day(s) in the following day(s):
2021-05-15
2021-05-17
Participant yrt1999 have non adherent day(s) in the following day(s):
2021-05-12
2021-05-15
2021-05-17
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
mCC_ID Participant_Study_ID Study Phase Intervention group (TRE or HABIT) Start_Day End_day Eating_Window_Start Eating_Window_End phase_duration caloric_entries_num ... logging_day_counts %_logging_day_counts good_logging_days %_good_logging_days good_window_days %_good_window_days outside_window_days %_outside_window_days adherent_days %_adherent_days
0 yrt1999 2 S-REM TRE 2021-05-12 2021-05-14 00:00:00 23:59:00 3 days 7 ... 3 100.0% 2.0 66.67% 3.0 100.0% 0.0 0.0% 2.0 66.67%
1 yrt1999 2 T3-INT TRE 2021-05-15 2021-05-18 08:00:00 18:00:00 4 days 8 ... 3 75.0% 2.0 50.0% 1.0 25.0% 2.0 50.0% 1.0 25.0%
2 yrt2000 3 T3-INT TRE 2021-05-12 2021-05-14 08:00:00 16:00:00 3 days 0 ... 0 0.0% 0.0 0.0% 0.0 0.0% 0.0 0.0% 0.0 0.0%
3 yrt2000 3 T3-INT TRE 2021-05-15 2021-05-18 08:00:00 16:00:00 4 days 0 ... 0 0.0% 0.0 0.0% 0.0 0.0% 0.0 0.0% 0.0 0.0%
4 yrt2001 4 T12-A TRE NaT NaT NaN NaN NaT 0 ... 0 nan% NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 32 columns

Look at resulting statistical information for the first row in the resulting dataset.

df.iloc[0]
mCC_ID                                           yrt1999
Participant_Study_ID                                   2
Study Phase                                        S-REM
Intervention group (TRE or HABIT)                    TRE
Start_Day                            2021-05-12 00:00:00
End_day                              2021-05-14 00:00:00
Eating_Window_Start                             00:00:00
Eating_Window_End                               23:59:00
phase_duration                           3 days 00:00:00
caloric_entries_num                                    7
medication_num                                         0
water_num                                              0
first_cal_avg                                   5.916667
first_cal_std                                   2.240722
last_cal_avg                                   19.666667
last_cal_std                                   12.933323
mean_daily_eating_window                           13.75
std_daily_eating_window                        11.986972
earliest_entry                                       4.5
2.5%                                              4.5375
97.5%                                            27.5625
duration mid 95%                                  23.025
logging_day_counts                                     3
%_logging_day_counts                              100.0%
good_logging_days                                    2.0
%_good_logging_days                               66.67%
good_window_days                                     3.0
%_good_window_days                                100.0%
outside_window_days                                  0.0
%_outside_window_days                               0.0%
adherent_days                                        2.0
%_adherent_days                                   66.67%
Name: 0, dtype: object

Example for a quick data analysis on non-phased studies.

take a look at the original dataset

df = treets.file_loader('data/test_food_details.csv')
df.head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Unnamed: 0 ID unique_code research_info_id desc_text food_type original_logtime foodimage_file_name
0 1340147 7572733 alqt14018795225 150 Water w 2017-12-08 17:30:00+00:00 NaN
1 1340148 411111 alqt14018795225 150 Coffee White b 2017-12-09 00:01:00+00:00 NaN

preprocess the data to create features we might need in the furthur analysis such as float time, week count since the first week, etc.

df = treets.load_food_data(df,'unique_code', 'original_logtime',4)
df.head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Unnamed: 0 ID unique_code research_info_id desc_text food_type original_logtime date float_time time week_from_start year
0 1340147 7572733 alqt14018795225 150 Water w 2017-12-08 17:30:00+00:00 2017-12-08 17.500000 17:30:00 1 2017
1 1340148 411111 alqt14018795225 150 Coffee White b 2017-12-09 00:01:00+00:00 2017-12-08 24.016667 00:01:00 1 2017

Call summarize_data() function to make the table that contains analytic information that we want.¶

df = treets.summarize_data(df, 'unique_code', 'float_time', 'date')
df.head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
unique_code num_days num_total_items num_f_n_b num_medications num_water first_cal_avg first_cal_std last_cal_avg last_cal_std eating_win_avg eating_win_std good_logging_count first_cal variation (90%-10%) last_cal variation (90%-10%) 2.5% 95% duration mid 95%
0 alqt1148284857 13 149 96 19 34 7.821795 6.710717 23.485897 4.869082 15.664103 8.231201 146 2.966667 9.666667 4.535000 26.813333 22.636667
1 alqt14018795225 64 488 484 3 1 7.525781 5.434563 25.858594 3.374839 18.332813 6.603913 484 13.450000 3.100000 4.183333 27.438333 23.416667

Look at resulting statistical information for the first row in the resulting dataset.

df.iloc[0]
unique_code                      alqt1148284857
num_days                                     13
num_total_items                             149
num_f_n_b                                    96
num_medications                              19
num_water                                    34
first_cal_avg                          7.821795
first_cal_std                          6.710717
last_cal_avg                          23.485897
last_cal_std                           4.869082
eating_win_avg                        15.664103
eating_win_std                         8.231201
good_logging_count                          146
first_cal variation (90%-10%)          2.966667
last_cal variation (90%-10%)           9.666667
2.5%                                      4.535
95%                                   26.813333
duration mid 95%                      22.636667
Name: 0, dtype: object

Clean text in food loggings

# import the dataset
df = treets.file_loader('data/col_test_data/yrt*')
df.head(3)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Unnamed: 0 original_logtime desc_text food_type PID
0 0 2021-05-12 02:30:00 +0000 Milk b yrt1999
1 1 2021-05-12 02:45:00 +0000 Some Medication m yrt1999
2 2 2021-05-12 04:45:00 +0000 bacon egg f yrt1999
treets.clean_loggings(df, 'desc_text', 'PID').head(3)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
PID desc_text cleaned
0 yrt1999 Milk [milk]
1 yrt1999 Some Medication [medication]
2 yrt1999 bacon egg [bacon, egg]

We can see that words are lower cased, modifiers are removed(2nd row) and items are split into individual items(third row).

Visualizations

# import the dataset
df = treets.file_loader('data/test_food_details.csv')
df.head(2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Unnamed: 0 ID unique_code research_info_id desc_text food_type original_logtime foodimage_file_name
0 1340147 7572733 alqt14018795225 150 Water w 2017-12-08 17:30:00+00:00 NaN
1 1340148 411111 alqt14018795225 150 Coffee White b 2017-12-09 00:01:00+00:00 NaN

make a scatter plot for people’s breakfast time

# create required features for function first_cal_mean_with_error_bar()
df['original_logtime'] = pd.to_datetime(df['original_logtime'])
df['local_time'] = treets.find_float_time(df, 'original_logtime')
df['date'] = treets.find_date(df, 'original_logtime')

# call the function
treets.first_cal_mean_with_error_bar(df,'unique_code', 'date', 'local_time')

Use swarmplot to visualize each person’s eating time distribution.

treets.swarmplot(df, 50, 'unique_code', 'date', 'local_time')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treets-1.0.5.tar.gz (30.9 kB view hashes)

Uploaded Source

Built Distribution

treets-1.0.5-py3-none-any.whl (25.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page