gRPC and REST endpoint for glucose prediction inference
Project description
GluRPC
Production-ready REST API server for real-time glucose prediction using the Gluformer transformer model.
Table of Contents
- Overview
- Features
- Installation
- Quick Start
- Configuration
- API Reference
- Data Requirements
- Deployment Scenarios
- Project Structure
- Performance
- Testing
- Logging
- Troubleshooting
- Contributing
Overview
GluRPC is a high-performance FastAPI service that processes continuous glucose monitoring (CGM) data and provides blood glucose predictions using the Gluformer model. The service handles multiple CGM device formats (Dexcom, FreeStyle Libre), performs quality checks, and generates visual predictions with uncertainty quantification through Monte Carlo Dropout.
Features
- ๐ Multi-format CGM Support: Auto-detects and parses Dexcom, FreeStyle Libre, and unified CSV formats
- ๐ง Transformer-based Predictions: Uses pre-trained Gluformer models from HuggingFace
- ๐ Uncertainty Quantification: Monte Carlo dropout for prediction confidence intervals (10 samples)
- โก Intelligent Caching:
- SHA256-based content-addressable storage
- Two-tier caching (memory + disk persistence)
- Superset matching for efficient reuse
- Configurable cache size (default: 128 datasets)
- ๐ Quality Assurance: Comprehensive data quality checks with detailed warnings
- ๐ Interactive Visualizations: Plotly-based prediction plots with fan charts for uncertainty
- โ๏ธ Background Processing: Async workers with priority-based task scheduling
- ๐ Optional API Key Authentication: Secure endpoint access control
- ๐ Detailed Logging: Timestamped logs with full pipeline traceability
- ๐ GPU Acceleration: Multi-GPU support with model pooling
Installation
Requirements
- Python 3.11+
uvpackage manager- (Optional) CUDA-compatible GPU for faster inference
Setup
# Clone repository
cd gluRPC
# Install dependencies using uv
uv sync
# For development (includes pytest and other dev tools)
uv sync --extra dev
Quick Start
1. Start the Server
Basic startup (production):
uv run uvicorn glurpc.app:app --host 0.0.0.0 --port 8000
With reload (development):
uv run uvicorn glurpc.app:app --host 0.0.0.0 --port 8000 --reload
Using the entry point:
uv run glurpc-server
2. Verify Installation
Check the health endpoint:
curl http://localhost:8000/health
Expected response:
{
"status": "ok",
"cache_size": 0,
"models_initialized": true,
"queue_length": 2,
"avg_fulfillment_time_ms": 0.0,
"vmem_usage_mb": 3584.2,
"device": "cuda",
"total_requests_processed": 0,
"total_errors": 0
}
3. Interactive API Documentation
Once running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
4. Quick Test
Convert a sample file:
curl -X POST "http://localhost:8000/convert_to_unified" \
-F "file=@your_cgm_data.csv"
Generate a quick plot:
# Base64 encode your CSV
CSV_BASE64=$(cat your_unified_data.csv | base64 -w 0)
# Get prediction plot
curl -X POST "http://localhost:8000/quick_plot" \
-H "Content-Type: application/json" \
-d "{\"csv_base64\": \"$CSV_BASE64\"}" | jq -r '.plot_base64' | base64 -d > prediction.png
Configuration
Environment Variables
All configuration options can be set via environment variables:
Cache Configuration
# Maximum number of datasets to cache (default: 128)
export MAX_CACHE_SIZE=128
# Enable/disable cache persistence to disk (default: True)
export ENABLE_CACHE_PERSISTENCE=True
Security Configuration
# Enable/disable API key authentication (default: False)
export ENABLE_API_KEYS=False
# If enabled, create api_keys_list file with one key per line:
# echo "your-secret-api-key-1" > api_keys_list
# echo "your-secret-api-key-2" >> api_keys_list
Data Processing Configuration
# Minimum data duration in minutes (default: 540 = 9 hours)
# Must be >= 540 (model requirement: 96 input points + 12 output points at 5min intervals)
export MINIMUM_DURATION_MINUTES=540
# Maximum wanted duration in minutes (default: 1080 = 18 hours)
# Larger datasets provide more prediction samples
export MAXIMUM_WANTED_DURATION=1080
Model and Inference Configuration
# Number of model copies per GPU device (default: 2)
# Increase for higher throughput, decrease if running out of VRAM
export NUM_COPIES_PER_DEVICE=2
# Number of background workers for calculations (default: 4)
export BACKGROUND_WORKERS_COUNT=4
# Batch size for inference (default: 32)
# Larger = faster but more memory
export BATCH_SIZE=32
# Number of Monte Carlo Dropout samples (default: 10)
# More samples = better uncertainty estimates but slower
export NUM_SAMPLES=10
Configuration File
Alternatively, create a .env file in the project root:
# .env
MAX_CACHE_SIZE=128
ENABLE_CACHE_PERSISTENCE=True
ENABLE_API_KEYS=True
NUM_COPIES_PER_DEVICE=2
BACKGROUND_WORKERS_COUNT=4
BATCH_SIZE=32
NUM_SAMPLES=10
API Reference
Authentication
Protected endpoints require an API key when authentication is enabled (ENABLE_API_KEYS=True):
curl -X POST "http://localhost:8000/process_unified" \
-H "X-API-Key: your-api-key-here" \
-H "Content-Type: application/json" \
-d '{"csv_base64": "..."}'
Protected Endpoints: /process_unified, /draw_a_plot, /quick_plot, /cache_management
Public Endpoints: /convert_to_unified, /health
1. Convert to Unified Format
Endpoint: POST /convert_to_unified
Authentication: None (public)
Content-Type: multipart/form-data
Converts any supported CGM format (Dexcom, FreeStyle Libre, etc.) to the standardized Unified format.
Request:
curl -X POST "http://localhost:8000/convert_to_unified" \
-F "file=@dexcom_export.csv"
Response:
{
"csv_content": "sequence_id,event_type,quality,datetime,glucose,notes,transmitter_id,transmitter_time\n1,EGV,OK,2025-12-01T08:00:00,120.0,,,",
"error": null
}
Supported Formats:
- Dexcom G6/G7 standard export
- FreeStyle Libre AGP reports
- Unified CSV format (pass-through)
2. Process and Cache Dataset
Endpoint: POST /process_unified
Authentication: Required (if enabled)
Content-Type: application/json
Processes a CSV file, performs quality checks, creates inference dataset, and caches it for future plot requests. Returns a unique handle for the dataset.
Request:
{
"csv_base64": "c2VxdWVuY2VfaWQsZXZlbnRfdHlwZS4uLg==",
"force_calculate": false
}
Parameters:
csv_base64(string, required): Base64-encoded unified CSV contentforce_calculate(boolean, optional): Iftrue, bypasses cache and forces reprocessing. Default:false
Response:
{
"handle": "0742f5d8d69da1a6f05a0ad493072ab5af4e7c212474acc54c43f89460662e80",
"warnings": {
"flags": 0,
"has_warnings": false,
"messages": []
},
"error": null
}
Cache Behavior:
- Direct Cache Hit: Returns existing handle immediately (~10ms)
- Superset Match: If a larger dataset covering the same time range exists, returns that handle
- Cache Miss: Processes data, enqueues background inference, returns handle (~1-3s)
Warning Flags:
| Flag | Description |
|---|---|
TOO_SHORT |
Insufficient data duration for predictions |
CALIBRATION |
Sensor calibration events detected |
QUALITY |
Data quality issues (gaps, noise) |
IMPUTATION |
Gaps filled via interpolation |
OUT_OF_RANGE |
Glucose values outside normal range (40-400 mg/dL) |
TIME_DUPLICATES |
Duplicate timestamps detected and resolved |
Example with curl:
CSV_BASE64=$(cat unified_data.csv | base64 -w 0)
curl -X POST "http://localhost:8000/process_unified" \
-H "X-API-Key: your-key" \
-H "Content-Type: application/json" \
-d "{\"csv_base64\": \"$CSV_BASE64\", \"force_calculate\": false}"
3. Generate Prediction Plot
Endpoint: POST /draw_a_plot
Authentication: Required (if enabled)
Content-Type: application/json
Generates a prediction plot for a specific sample in a cached dataset. Returns a PNG image.
Request:
{
"handle": "0742f5d8d69da1a6f05a0ad493072ab5af4e7c212474acc54c43f89460662e80",
"index": 0
}
Parameters:
handle(string, required): Dataset handle from/process_unifiedindex(integer, required): Sample index in the dataset0= Last sample (most recent)-1= Second-to-last sample-(N-1)= First sample (where N is dataset length)
Response: PNG image (binary, image/png)
Plot Components:
- Blue line: Historical glucose values (last 12 points = 1 hour) + actual future values (next 12 points = 1 hour)
- Red line: Median predicted glucose (next 12 points = 1 hour)
- Blue gradient fan charts: Prediction uncertainty distribution from Monte Carlo Dropout (10 samples)
- Darker = earlier predictions
- Lighter = later predictions
- Width indicates uncertainty
Timing:
- If plot already calculated: ~100-500ms
- If inference needed: ~5-15 seconds (first request for a dataset)
- Subsequent requests for same index: instant (cached)
Example:
curl -X POST "http://localhost:8000/draw_a_plot" \
-H "X-API-Key: your-key" \
-H "Content-Type: application/json" \
-d '{"handle":"0742f5d8d69da1a6f05a0ad493072ab5af4e7c212474acc54c43f89460662e80","index":0}' \
--output prediction.png
Error Responses:
404 Not Found: Handle doesn't exist or has been evicted from cache400 Bad Request: Index out of range500 Internal Server Error: Calculation or rendering failed
4. Quick Plot (One-Shot)
Endpoint: POST /quick_plot
Authentication: Required (if enabled)
Content-Type: application/json
Processes data and immediately returns a base64-encoded plot for the last available sample. Combines /process_unified and /draw_a_plot in a single request.
Request:
{
"csv_base64": "c2VxdWVuY2VfaWQsZXZlbnRfdHlwZS4uLg==",
"force_calculate": false
}
Parameters:
csv_base64(string, required): Base64-encoded unified CSV contentforce_calculate(boolean, optional): Bypass cache iftrue. Default:false
Response:
{
"plot_base64": "iVBORw0KGgoAAAANSUhEUgAAA+gAAAJYCAYAAADxHswlAAAgAElEQVR4nOzdd5wV1f...",
"warnings": {
"flags": 0,
"has_warnings": false,
"messages": []
},
"error": null
}
Use Case: One-off predictions without needing to manage handles. Ideal for:
- Testing
- Simple integrations
- Single-use predictions
Timing:
- First request: ~6-18 seconds (processing + inference + calculation + rendering)
- Cached request: ~100-500ms (cache hit + rendering)
Example:
CSV_BASE64=$(cat unified_data.csv | base64 -w 0)
RESPONSE=$(curl -X POST "http://localhost:8000/quick_plot" \
-H "X-API-Key: your-key" \
-H "Content-Type: application/json" \
-d "{\"csv_base64\": \"$CSV_BASE64\"}")
# Extract base64 plot and decode to file
echo $RESPONSE | jq -r '.plot_base64' | base64 -d > prediction.png
5. Cache Management
Endpoint: POST /cache_management
Authentication: Required (if enabled)
Content-Type: Query parameters
Manage the cache: flush, get info, delete specific handles, save/load from disk.
Actions:
Flush (Clear All)
curl -X POST "http://localhost:8000/cache_management?action=flush" \
-H "X-API-Key: your-key"
Response:
{
"success": true,
"message": "Cache flushed successfully",
"cache_size": 0,
"persisted_count": 0,
"items_affected": null
}
Get Cache Info
curl -X POST "http://localhost:8000/cache_management?action=info" \
-H "X-API-Key: your-key"
Response:
{
"success": true,
"message": "Cache info retrieved",
"cache_size": 42,
"persisted_count": 42,
"items_affected": null
}
Delete Specific Handle
curl -X POST "http://localhost:8000/cache_management?action=delete&handle=0742f5d8d69da1a6..." \
-H "X-API-Key: your-key"
Response:
{
"success": true,
"message": "Handle 0742f5d8... deleted successfully",
"cache_size": 41,
"persisted_count": 41,
"items_affected": 1
}
Save to Disk
# Save all in-memory cache
curl -X POST "http://localhost:8000/cache_management?action=save" \
-H "X-API-Key: your-key"
# Save specific handle
curl -X POST "http://localhost:8000/cache_management?action=save&handle=0742f5d8..." \
-H "X-API-Key: your-key"
Load from Disk
curl -X POST "http://localhost:8000/cache_management?action=load&handle=0742f5d8..." \
-H "X-API-Key: your-key"
Parameters:
action(string, required): Operation to perform (flush,info,delete,save,load)handle(string, optional): Required fordeleteandload, optional forsave
6. Health Check
Endpoint: GET /health
Authentication: None (public)
Returns comprehensive server status, cache metrics, and performance statistics.
Request:
curl http://localhost:8000/health
Response:
{
"status": "ok",
"cache_size": 42,
"models_initialized": true,
"queue_length": 2,
"avg_fulfillment_time_ms": 123.45,
"vmem_usage_mb": 3584.2,
"device": "cuda",
"total_requests_processed": 1234,
"total_errors": 5
}
Fields:
status: Service status (ok,degraded,error)cache_size: Number of datasets currently cached (memory + disk)models_initialized: Whether ML models are loaded and readyqueue_length: Number of models available in poolavg_fulfillment_time_ms: Average time to acquire a model from poolvmem_usage_mb: GPU VRAM usage in MB (0 if CPU)device: Inference device (cpu,cuda,cuda:0, etc.)total_requests_processed: Request counter since startuptotal_errors: Error counter since startup
Use Case:
- Health checks for load balancers
- Monitoring dashboards
- Service readiness probes
Data Requirements
Input Format
The service accepts CSV files from:
- Dexcom G6/G7: Standard export format
- FreeStyle Libre: AGP reports
- Unified Format: Custom standardized schema
Unified CSV Schema
sequence_id,event_type,quality,datetime,glucose,notes,transmitter_id,transmitter_time
1,EGV,OK,2025-12-01T08:00:00,120.0,,,
1,EGV,OK,2025-12-01T08:05:00,125.0,,,
Required columns:
sequence_id: Integer identifier for continuous sequencesevent_type: Event type (e.g., "EGV" for estimated glucose value)quality: Data quality indicatordatetime: ISO 8601 timestampglucose: Glucose value in mg/dL
Minimum Data Requirements
- Duration: At least 540 minutes (9 hours) of continuous data
- Interval: 5-minute sampling (automatically interpolated if needed)
- Prediction Window:
- Input: 96 points (8 hours of history at 5min intervals)
- Output: 12 points (1 hour prediction at 5min intervals)
Data Processing Pipeline
- Format Detection: Auto-detect input format
- Parsing: Convert to unified format
- Gap Interpolation: Fill gaps up to 15 minutes
- Timestamp Synchronization: Align to 5-minute intervals
- Quality Validation: Check duration and data quality
- Feature Engineering: Extract temporal features (hour, day, etc.)
- Segmentation: Split into continuous sequences
- Scaling: Standardize values
- Dataset Creation: Create Darts TimeSeries dataset
Deployment
Development / Local Testing
For local development and testing:
# Simple startup with auto-reload
uv run uvicorn glurpc.app:app --reload
# Or with explicit parameters
uv run uvicorn glurpc.app:app \
--host 127.0.0.1 \
--port 8000 \
--reload \
--log-level debug
Configuration: Default settings, no authentication required.
Production Deployment
For production deployment scenarios (systemd, Docker, Kubernetes, AWS, multi-GPU), see the deployment/ directory.
Available deployment configurations:
- Single-server with systemd and Nginx
- Docker and Docker Compose
- Kubernetes with auto-scaling
- AWS ECS (Fargate)
- Multi-GPU server setup
โ ๏ธ Note: All deployment configurations are untested examples and should be customized and tested thoroughly before production use. See deployment/README.md for details
Project Structure
gluRPC/
โโโ src/glurpc/
โ โโโ app.py # FastAPI application & endpoints
โ โโโ core.py # Core business logic & orchestration
โ โโโ engine.py # Model management & background workers
โ โโโ logic.py # Data processing & ML inference
โ โโโ state.py # Cache & task coordination
โ โโโ schemas.py # Pydantic request/response models
โ โโโ config.py # Configuration & environment variables
โ โโโ data_classes.py # Domain data models
โโโ tests/
โ โโโ test_integration.py # Integration tests
โ โโโ test_integration_load.py # Load/stress tests
โโโ logs/ # Timestamped log files (auto-created)
โโโ cache_storage/ # Persistent cache (auto-created)
โโโ files/ # Generated plots (gitignored)
โโโ data/ # Sample data (gitignored)
โโโ pyproject.toml # Project dependencies
โโโ uv.lock # Dependency lock file
โโโ README.md # This file
โโโ HLA.md # High-Level Architecture document
Performance
Benchmarks
Hardware: NVIDIA RTX 4090 (24GB VRAM), AMD Ryzen 9 5950X
| Operation | First Request | Cached Request |
|---|---|---|
| Convert CSV | 50-200ms | N/A |
| Process & Cache | 1-3s | 10-50ms (cache hit) |
| Generate Plot | 5-15s | 100-500ms |
| Quick Plot | 6-18s | 100-500ms |
Throughput:
- With 2 GPUs (4 model copies): ~40-60 plots/minute
- Cache hit rate >80% in typical usage: ~200-300 plots/minute
Resource Usage
- Memory: ~8-12GB (2 model copies @ 2GB each + cache)
- VRAM: ~4-6GB per GPU (2 model copies)
- CPU: 2-4 cores recommended
- Disk: ~10MB per cached dataset
Cache Performance
- Memory Cache Hit: ~10ms
- Disk Cache Hit: ~50-100ms (load from Parquet)
- Cache Miss: ~1-3s (processing)
- Superset Match: ~50ms (metadata lookup + return)
Testing
Run All Tests
uv run pytest tests/
Run with Coverage
uv run pytest tests/ --cov=src/glurpc --cov-report=html
Run Specific Test
uv run pytest tests/test_integration.py::test_quick_plot
Load Testing
# Run load test with 10 concurrent requests
uv run pytest tests/test_integration_load.py::test_concurrent_quick_plot -v
Logging
Log Location
Logs are written to logs/glurpc_YYYYMMDD_HHMMSS.log with the following information:
- Data processing pipeline steps
- Dataset shapes at each transformation
- Scaler parameters (min/scale values)
- Model predictions statistics
- Cache operations
- Errors with full stack traces
Log Levels
- DEBUG: Detailed execution traces (disabled for
calclogger to reduce verbosity) - INFO: Request/response logging, pipeline steps
- WARNING: Data quality issues, cache misses
- ERROR: Failures with stack traces
Example Log Entries
2025-12-01 08:26:40,843 - glurpc - INFO - Action: process_and_cache started (force=False)
2025-12-01 08:26:40,873 - glurpc - INFO - Action: process_and_cache - generated handle=0742f5d8..., df_shape=(3889, 9)
2025-12-01 08:26:40,902 - glurpc.logic - INFO - Dataset creation successful: 3707 samples, warnings=0
2025-12-01 08:26:45,332 - glurpc - INFO - InfWorker 0: Running FULL inference for 0742f5d8... (3707 items)
2025-12-01 08:27:02,118 - glurpc - INFO - Action: generate_plot_from_handle completed - png_size=125432 bytes
Troubleshooting
Model Download Issues
If HuggingFace download fails:
# Set cache directory
export HF_HOME=/path/to/cache
# Or manually download and place in cache
huggingface-cli download Livia-Zaharia/gluformer_models \
gluformer_1samples_500epochs_10heads_32batch_geluactivation_livia_large_weights.pth
Memory Issues
If running out of memory:
# Reduce cache size
export MAX_CACHE_SIZE=32
# Reduce batch size
export BATCH_SIZE=16
# Reduce model copies
export NUM_COPIES_PER_DEVICE=1
CUDA Out of Memory
# Use smaller batch size
export BATCH_SIZE=16
# Reduce model copies per GPU
export NUM_COPIES_PER_DEVICE=1
# Or use CPU (slower)
# Models will auto-detect and use CPU if no GPU available
Plot Generation Failures
Ensure kaleido is properly installed:
uv add kaleido==0.2.1
API Key Issues
Check API keys file:
# File should exist and contain keys (one per line)
cat api_keys_list
# Ensure no extra whitespace or comments
sed '/^#/d; /^$/d' api_keys_list
Cache Persistence Issues
Check disk permissions:
# Ensure cache directory is writable
mkdir -p cache_storage
chmod 755 cache_storage
# Disable persistence if issues persist
export ENABLE_CACHE_PERSISTENCE=False
Contributing
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Make changes and add tests
- Run tests (
uv run pytest) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
Development Guidelines
- Use type hints for all functions
- Add docstrings for public APIs
- Follow existing code style
- Add tests for new features
- Update documentation
License
See LICENSE file for details.
Citation
If you use this service in your research, please cite:
@software{glurpc2025,
title={GluRPC: REST API for Glucose Prediction},
author={GlucoseDAO Contributors},
year={2025},
url={https://github.com/glucosedao/gluRPC}
}
Support
- Issues: GitHub Issues
- Documentation:
- Contact: Project maintainers
Acknowledgments
- GlucoBench - Curated list of Continuous Glucose Monitoring datasets with prediction benchmarks
- Gluformer - Transformer-based model for glucose prediction
- CGM-Format - Library for parsing CGM data
- GlucoseDAO community for contributions and feedback
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file glurpc-0.3.0.tar.gz.
File metadata
- Download URL: glurpc-0.3.0.tar.gz
- Upload date:
- Size: 290.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c2203e1abef302d450991a6c18e9b64cc78eeebf569a4065499cd2b59f05686
|
|
| MD5 |
e0b198581deebff6f3475e89c7af2f1c
|
|
| BLAKE2b-256 |
5170c3d1873893d838d7d41dc6031f0fc9d373b3b373497ea4324359f7b57182
|
File details
Details for the file glurpc-0.3.0-py3-none-any.whl.
File metadata
- Download URL: glurpc-0.3.0-py3-none-any.whl
- Upload date:
- Size: 48.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e6f4ee8da086bd9e36a264334c8fc5bb4160075b73130ffd08cc65ff9485645
|
|
| MD5 |
a200bf5467525789b22a2dc1f15b0647
|
|
| BLAKE2b-256 |
5d9c097ab3c71dab0e545b5bcfdcb23d0b9e9a4f3fe03135983f3accff2c6f06
|