Skip to main content

A production-quality Python package for continuous glucose monitor (CGM) data analysis and ML-ready feature extraction.

Project description

GlycoSignal

PyPI version Python versions Tests License: MIT

GlycoSignal

CGM analysis for Python. Compute any glycemic metric individually, or generate ML-ready feature matrices from sliding windows.


Installation

pip install GlycoSignal

Optional extras:

pip install "GlycoSignal[dev]"     # pytest, black, ruff, mypy
pip install "GlycoSignal[report]"  # HTML reports (adds Jinja2)

Input Data Format

GlycoSignal reads CSV files. Your CSV needs at minimum a timestamp column and a glucose column.

Column Type Required Description
Timestamp datetime Yes Reading timestamp (any format pandas can parse)
Glucose float Yes Glucose value in mg/dL
subject string For multi-subject files Subject or patient identifier

Column names are auto-detected. You do not need to rename your columns before loading. GlycoSignal recognizes these common names (case-insensitive):

  • Timestamp: Timestamp, time, datetime, date_time, date
  • Glucose: Glucose, Glucose Value (mg/dL), gl, sgv, glucose_mg_dl, bg, blood_glucose
  • Subject: subject, id, ptid, patient_id, subjectid

If your column names are not recognized, pass them explicitly:

df = glycosignal.load_csv("data.csv", timestamp_col="time_utc", glucose_col="bg_mg_dl")

Example CSV (single subject):

Timestamp,Glucose
2024-01-15 08:00:00,123
2024-01-15 08:05:00,121
2024-01-15 08:10:00,125

Example CSV (multiple subjects in one file):

Timestamp,Glucose,subject
2024-01-15 08:00:00,123,P001
2024-01-15 08:00:00,135,P002
2024-01-15 08:05:00,121,P001
2024-01-15 08:05:00,140,P002

Working with multiple subjects

If your data has multiple subjects in one file, use load_cgm_file and specify which column identifies each subject:

from glycosignal import io

df = io.load_cgm_file("all_subjects.csv", subject_col="ptid")

If each subject is in a separate CSV file inside a folder, use load_cgm_folder. It auto-derives a subject column from each filename:

df = io.load_cgm_folder("data/subjects/")
# Adds 'subject' and 'filename' columns automatically

When building sliding windows or feature maps from multi-subject data, use group_col to process each subject independently:

from glycosignal import windows, features

result = windows.create_sliding_windows(df, window_hours=24, group_col="subject")
X = features.build_feature_map(result.windows)
# X contains rows for all subjects, with a 'subject' column preserved

Unit conversion

GlycoSignal expects glucose in mg/dL. If your data is in mmol/L, convert first:

from glycosignal import preprocessing

df = preprocessing.convert_units(df, from_unit="mmol/L", to_unit="mg/dL")

Device-specific loaders

For Dexcom and FreeStyle Libre exports (which have non-standard headers), use the dedicated loaders:

df = io.load_dexcom("dexcom_export.csv")     # skips Dexcom header row
df = io.load_libre("libre_export.csv")       # skips Libre 2-row header

What you can do in 30 seconds

import glycosignal

df = glycosignal.load_csv("examples/sample_cgm.csv")  # included in repo
df = glycosignal.clean_cgm(df)

# One metric
print(glycosignal.mean_glucose(df))        # 128.4
print(glycosignal.time_in_range_percent(df, low=70, high=180))  # 81.2

# Full feature matrix (20+ features, one row per 24h window)
result = glycosignal.create_sliding_windows(df, window_hours=24)
X = glycosignal.build_feature_map(result.windows)
print(X.shape)  # (7, 23)

Core Workflows

A. Compute glycemic metrics

Every metric is a standalone function. Call them on any cleaned DataFrame or pass a pre-prepared object for efficiency.

from glycosignal import metrics

metrics.mean_glucose(df)                           # 128.4
metrics.cv(df)                                     # 26.6 (%)
metrics.time_in_range_percent(df, low=70, high=180)  # 81.2
metrics.lbgi(df)                                   # 0.8
metrics.mage(df)                                   # 45.3
metrics.gri(df)                                    # 12.4

Group them when you need everything at once:

metrics.basic_stats(df)
# {'mean': 128.4, 'median': 126.0, 'min': 62.0, 'max': 248.0, 'q1': 102.0, 'q3': 155.0}

metrics.variability_metrics(df)
# {'sd': 34.2, 'cv': 26.6, 'j_index': 26.7, 'mage': 45.8}

metrics.risk_indices(df)
# {'lbgi': 0.8, 'hbgi': 3.2, 'adrr': 14.1, 'gri': 12.4}

metrics.summary_dict(df)   # all of the above combined

Performance tip: Call prepare() once when computing many features on the same data:

from glycosignal.schemas import prepare

p = prepare(df)
metrics.mean_glucose(p)
metrics.cv(p)
metrics.lbgi(p)

Callable metric functions

These are called directly from the metrics module and accept parameters. All accept a DataFrame or PreparedCGMData.

Feature Description Computation
Basic stats
mean_glucose(data) Mean BGL μ = (1/N) Σ Xᵢ
median_glucose(data) Median BGL Middle value of sorted readings
min_glucose(data) Minimum BGL Min(X₁, ..., Xₙ)
max_glucose(data) Maximum BGL Max(X₁, ..., Xₙ)
q1_glucose(data) First quartile of BGL Q1 = Percentile(X, 25)
q3_glucose(data) Third quartile of BGL Q3 = Percentile(X, 75)
Variability
sd(data) Standard deviation of BGL σ = √(Σ(Xᵢ - μ)² / N)
cv(data) Coefficient of variation CV = (σ / μ) × 100
j_index(data) J-index (glycemic variability) J = 0.001 × (μ + σ)²
mage(data) Mean Amplitude of Glucose Excursions Mean of alternating peak-nadir amplitudes exceeding σ
conga24(data) Continuous Overall Net Glycemic Action SD of {G(t) - G(t - 24h)} for all matched pairs
Time-in-range (parametric)
time_in_range_minutes(data, low, high) BGL time inside [low, high] TIR = Δt × Σ(low ≤ BGL(t) ≤ high)
time_in_range_percent(data, low, high) Percent time inside [low, high] TIR% = (TIR / T) × 100
time_below_range_minutes(data, threshold) BGL time below threshold TBR = Δt × Σ(BGL(t) ≤ threshold)
time_below_range_percent(data, threshold) Percent time below threshold TBR% = (TBR / T) × 100
time_above_range_minutes(data, threshold) BGL time above threshold TAR = Δt × Σ(BGL(t) ≥ threshold)
time_above_range_percent(data, threshold) Percent time above threshold TAR% = (TAR / T) × 100
time_outside_range_minutes(data, low, high) BGL time outside [low, high] TOR = Δt × Σ(BGL(t) < low or BGL(t) > high)
time_outside_range_percent(data, low, high) Percent time outside [low, high] TOR% = (TOR / T) × 100
Risk indices
lbgi(data) Low Blood Glucose Index LBGI = (1/N) Σ rl(Xᵢ); f(X) = ln(X)^1.084 - 5.381; rl = 22.77 × f² if f ≤ 0
hbgi(data) High Blood Glucose Index HBGI = (1/N) Σ rh(Xᵢ); rh = 22.77 × f² if f > 0
adrr(data) Average Daily Risk Range ADRR = Max(rl) + Max(rh)
gri(data) Glucose Risk Index GRI = 3.0×%TBR₅₄ + 2.4×%TBR₇₀ + 1.6×%TAR₂₅₀ + 0.8×%TAR₁₈₀, capped at 100
Excursions
mean_glucose_excursion(data) Mean BGL outside mean ± SD Mean of Xᵢ where Xᵢ < μ - σ or Xᵢ > μ + σ
mean_glucose_normal(data) Mean BGL inside mean ± SD Mean of Xᵢ where μ - σ ≤ Xᵢ ≤ μ + σ
Peak counts
count_peaks(data, threshold) Episodes above threshold Count of rising-edge crossings above threshold
count_peaks_in_range(data, lower, upper) Episodes entering [lower, upper] Count of rising-edge entries into [lower, upper]

Notation: N = readings, Xᵢ = glucose value, μ = mean, σ = SD, Δt = interval between readings, T = total monitoring time.


Registry feature names

These are the 32 fixed-parameter feature names used in build_feature_map(feature_names=[...]). They are pre-wired to specific clinical thresholds and require no parameters.

glycosignal.list_features()           # all 32 names
glycosignal.list_features(category="time_in_range")  # filtered by category
Feature name Category Description
mean_glucose basic_stats Mean glucose (mg/dL)
median_glucose basic_stats Median glucose (mg/dL)
min_glucose basic_stats Minimum glucose (mg/dL)
max_glucose basic_stats Maximum glucose (mg/dL)
q1_glucose basic_stats 25th percentile glucose (mg/dL)
q3_glucose basic_stats 75th percentile glucose (mg/dL)
sd variability Standard deviation of glucose (mg/dL)
cv variability Coefficient of variation (%)
j_index variability J-index: 0.001 × (mean + SD)²
mage variability Mean Amplitude of Glucose Excursions (mg/dL)
conga24 variability SD of glucose differences 24h apart (mg/dL)
tir_70_180_min time_in_range Minutes in target range 70–180 mg/dL
tir_70_180_pct time_in_range Percent time in target range 70–180 mg/dL
tir_70_140_min time_in_range Minutes in tight range 70–140 mg/dL
tir_70_140_pct time_in_range Percent time in tight range 70–140 mg/dL
tbr_70_min time_in_range Minutes below 70 mg/dL (level 1 hypoglycemia)
tbr_70_pct time_in_range Percent time below 70 mg/dL
tbr_54_min time_in_range Minutes below 54 mg/dL (level 2 hypoglycemia)
tbr_54_pct time_in_range Percent time below 54 mg/dL
tar_180_min time_in_range Minutes above 180 mg/dL (level 1 hyperglycemia)
tar_180_pct time_in_range Percent time above 180 mg/dL
tar_250_min time_in_range Minutes above 250 mg/dL (level 2 hyperglycemia)
tar_250_pct time_in_range Percent time above 250 mg/dL
lbgi risk Low Blood Glucose Index
hbgi risk High Blood Glucose Index
adrr risk Average Daily Risk Range
gri risk Glucose Risk Index (Klonoff et al. 2023)
mean_glucose_excursion excursion Mean of readings outside mean ± 1SD (mg/dL)
mean_glucose_normal excursion Mean of readings inside mean ± 1SD (mg/dL)
peaks_above_140 peak Excursion episodes above 140 mg/dL (count)
peaks_above_180 peak Excursion episodes above 180 mg/dL (count)
peaks_above_250 peak Severe hyperglycemic episodes above 250 mg/dL (count)

B. Build ML feature matrices

The recommended pipeline: clean data, create windows, extract features, train.

import glycosignal
from glycosignal import windows, features
from sklearn.ensemble import RandomForestClassifier

df = glycosignal.load_csv("cgm.csv")
df = glycosignal.clean_cgm(df)

result = windows.create_sliding_windows(df, window_hours=24, overlap_hours=0)
X = features.build_feature_map(result.windows)
# X has one row per window, 20+ feature columns

feature_cols = [c for c in X.columns if c not in ("window_id", "subject", "date")]
clf = RandomForestClassifier()
clf.fit(X[feature_cols], y)

Select a specific subset of features:

X = features.build_feature_map(
    result.windows,
    feature_names=["mean_glucose", "cv", "tir_70_180_pct", "mage", "lbgi"],
)

Build a feature vector for a single window:

features.build_feature_vector(window_df, feature_names=["mean_glucose", "cv"])
# {'mean_glucose': 128.4, 'cv': 26.6}

Build a table from a list of DataFrames (one per subject):

features.build_feature_table(
    [df_s01, df_s02, df_s03],
    record_ids=["S01", "S02", "S03"],
)

Conceptual Model

CSV / DataFrame
    -> io.load_csv() / load_cgm_folder()
    -> preprocessing.standardize_columns() + clean_cgm()
    -> metrics.*()              # individual features, any time
    -> windows.create_sliding_windows()
    -> features.build_feature_map()
    -> ML model
  • Timestamps and glucose values are the only required columns.
  • Preprocessing is explicit: nothing is cleaned silently.
  • Metrics accept a raw DataFrame or a pre-prepared object interchangeably.
  • Windows output long-format rows with a window_id column.
  • Feature maps are plain DataFrames, ready for scikit-learn or any other tool.
  • Registry is inspectable: every feature has a name, description, and category.

Feature System

GlycoSignal has 32 built-in features organized by category. You can call each individually, use grouped helpers, or let the registry compute them in bulk.

import glycosignal

glycosignal.list_features()
# ['adrr', 'conga24', 'cv', 'gri', 'hbgi', 'j_index', 'lbgi', 'mage', ...]

glycosignal.list_features(category="risk")
# ['adrr', 'gri', 'hbgi', 'lbgi']

glycosignal.get_feature_metadata()
# DataFrame: name | description | category | output_type

glycosignal.get_feature("gri").description
# 'Glucose Risk Index (Klonoff et al. 2023)'

To register a custom feature:

from glycosignal.registry import DEFAULT_REGISTRY

DEFAULT_REGISTRY.register(
    name="my_metric",
    func=my_function,
    description="Custom metric",
    category="variability",
)

Additional Capabilities

Data loading

Loads CSVs with automatic column detection. Supports per-subject folders, multi-subject files, Dexcom, and Libre exports.

from glycosignal import io

df = io.load_csv("data.csv")
df = io.load_csv("data.csv", timestamp_col="time_utc", glucose_col="bg_mg_dl")
df = io.load_cgm_folder("data/subjects/")     # adds filename + subject columns
df = io.load_cgm_file("all.csv", subject_col="ptid")
df = io.load_dexcom("dexcom_export.csv")
df = io.load_libre("libre_export.csv")

Preprocessing

All functions return cleaned copies. Nothing is modified in place.

from glycosignal import preprocessing

df = preprocessing.standardize_columns(df)          # rename "gl", "time", etc.
df = preprocessing.clean_cgm(df)                    # drop NaN, sort, enforce positive
report = preprocessing.validate_cgm(df)             # structured quality report
gaps = preprocessing.detect_gaps(df)                # DataFrame of gap intervals
df = preprocessing.resample_cgm(df, freq="5min")    # regular grid
df = preprocessing.interpolate_cgm(df, method="pchip", max_gap_points=12)
df = preprocessing.convert_units(df, from_unit="mmol/L", to_unit="mg/dL")

Event detection

Returns a DataFrame of episodes with start_time, end_time, duration_minutes, and event_type.

from glycosignal import detect

detect.detect_hypoglycemia(df, threshold=70, min_duration_minutes=15)
detect.detect_hyperglycemia(df, threshold=180, min_duration_minutes=15)
detect.detect_nocturnal_events(df, start_hour=0, end_hour=6)
detect.detect_postprandial_excursions(df, rise_threshold=50)

Plotting

All functions return (fig, ax) and never call plt.show().

from glycosignal import plotting

fig, ax = plotting.plot_glucose_timeseries(df, subject="P001")
fig, ax = plotting.plot_daily_overlay(df)
fig, ax = plotting.plot_agp(df)
fig, ax = plotting.plot_histogram(df)

Reporting

Generates a self-contained HTML report with summary metrics, TIR, risk indices, and embedded plots.

from glycosignal import report

report.generate_summary_report(df, output_path="cgm_report.html")

CLI

glycosignal summary data.csv
glycosignal windows data.csv --window-hours 24 --overlap-hours 0 --output windows.csv
glycosignal features windows.csv --output features.csv
glycosignal features windows.csv --features mean_glucose,cv,lbgi,gri
glycosignal report data.csv --output report.html
glycosignal list-features
glycosignal list-features --category risk

Migration from Script Version

Old script New location Key change
glycosignal.py glycosignal.metrics + glycosignal.schemas Lowercase names; uppercase aliases (LBGI, MAGE, TIR) still work
cgm_sliding_window.py (loaders) glycosignal.io Column renamed to Glucose from Glucose Value (mg/dL)
cgm_sliding_window.py (windowing) glycosignal.windows Output is long-format; use pivot_windows_wide() for the old format
cgm_feature_map.py create_feature_map() glycosignal.features.build_feature_map_wide() Long-format preferred via build_feature_map()
cgm_feature_map.py FEATURES list glycosignal.registry.DEFAULT_REGISTRY Structured registry with metadata

Metric calls are backward-compatible:

# Old
import glycosignal as gs
gs.mean_glucose(df)

# New (identical result)
import glycosignal
glycosignal.mean_glucose(df)

Feature map migration:

# Old
from cgm_feature_map import create_feature_map
X = create_feature_map(windows_df)

# New (wide-format still works)
from glycosignal.features import build_feature_map_wide
X = build_feature_map_wide(windows_df)

# New preferred (long-format pipeline)
from glycosignal import windows, features
result = windows.create_sliding_windows(df)
X = features.build_feature_map(result.windows)

Folder loading migration:

# Old
from cgm_sliding_window import load_cgm_folder
df = load_cgm_folder("Data/Processed")

# New
from glycosignal import io, preprocessing
df = io.load_cgm_folder("Data/Processed")
df = preprocessing.standardize_columns(df)
df = preprocessing.clean_cgm(df)

Development

git clone https://github.com/glycosignal/glycosignal
cd glycosignal
pip install -e ".[dev]"
pytest

License

MIT. Copyright (c) 2024 Jiafeng Song. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glycosignal-0.1.0.tar.gz (83.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glycosignal-0.1.0-py3-none-any.whl (55.1 kB view details)

Uploaded Python 3

File details

Details for the file glycosignal-0.1.0.tar.gz.

File metadata

  • Download URL: glycosignal-0.1.0.tar.gz
  • Upload date:
  • Size: 83.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for glycosignal-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a1864356e37dbaa664e0c6df059270b04d7f5d56ea9fc806ca24df56cbc1268a
MD5 a0ded38b789c0a1e3938fff02aa61b75
BLAKE2b-256 bc17cab187e3bae32801ebb012b6f60d52c7daddb276add1511ebd632a86d9fa

See more details on using hashes here.

File details

Details for the file glycosignal-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: glycosignal-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 55.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for glycosignal-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1115401a576cbe152edca7beb42ae9a1e759e825c742e8fa43bdce34784df06
MD5 ab32e02e1f038d139dcfb6ebe0e89807
BLAKE2b-256 d10c9c119541a2bf7130a33c333e8503f5dea9c737dbf5e665b240c83b97eeb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page