A tool for extracting data from Google Sheets and normalizing it into SQLite
Project description
๐ JJF Survey Analytics Platform
A comprehensive survey data management and analytics platform that reads data directly from Google Sheets into memory and provides powerful analytics dashboards with real-time data processing capabilities.
๐ฏ Project Intent
This platform is designed to:
- Load survey data directly from multiple Google Sheets sources
- Process data in-memory for fast access and analysis
- Analyze survey responses with statistical insights and visualizations
- Monitor response activity and respondent patterns
- Refresh automatically when the application restarts
- Provide a beautiful, responsive web interface for data exploration
โจ Key Features
- ๐ Survey Analytics Dashboard - Comprehensive statistics and visualizations
- โก In-Memory Processing - Fast data access with no database overhead
- ๐ Response Activity Monitoring - Track who responded and when
- ๐ฅ Health Check System - Monitor API connectivity and system health
- ๐ Authentication - Secure access with password protection
- ๐ฑ Responsive Design - Beautiful Tailwind CSS interface for all devices
- ๐ Production Ready - Deployable to Railway as lightweight application
- ๐ Single Source of Truth - Google Sheets as authoritative data source, in-memory processing
๐ Performance Features
- TTL-Based Report Caching (v1.2.6)
- Configurable cache TTL via
REPORT_CACHE_TTLenvironment variable - Default: 300 seconds (5 minutes)
- 2,400x performance improvement: 120 seconds โ 50ms (cached)
- Automatic cache invalidation on data changes or TTL expiration
- Configurable cache TTL via
๐ Data Integrity
- Calculation Checksums (v1.2.3)
- SHA-256 checksums for all dimension calculations
- Verifies calculation consistency across API responses
- Enables audit trail and debugging
๐ JSON API Endpoints (v1.2.4)
GET /api/reports/organizations- List all organizations with metadataGET /api/reports/organization/<name>/json- Full report with checksumsGET /api/reports/organization/<name>/verification- Verification data onlyGET /api/docs- Interactive API documentation with examples- CORS enabled for cross-origin requests
๐ Quick Start
Prerequisites
- Python 3.8 or higher
- pip (Python package manager)
- Internet connection (for Google Sheets access)
1. Install Dependencies
pip install -r requirements.txt
2. Start the Web Application
python app.py
5. Open in Browser
Navigate to: http://localhost:8080
Authentication:
- Local Development: No password required (disabled by default)
- Production: Set
REQUIRE_AUTH=trueandAPP_PASSWORD=your-password
๐ Project Structure
jjf-survey-analytics/
โโโ app.py # Main Flask web application
โโโ railway_app.py # Railway-specific deployment app
โโโ survey_analytics.py # Survey analytics engine
โโโ survey_normalizer.py # Data normalization service
โโโ auto_sync_service.py # Background auto-sync service
โโโ improved_extractor.py # Google Sheets data extractor
โโโ healthcheck.py # Health check entry point
โ
โโโ healthcheck/ # Health check system
โ โโโ api_validators.py # API key validation
โ โโโ dependency_checker.py # External dependency checks
โ โโโ e2e_tests.py # End-to-end tests
โ โโโ monitoring.py # Continuous monitoring
โ โโโ config_validator.py # Configuration validation
โ
โโโ templates/ # HTML templates
โ โโโ base.html # Base template with navigation
โ โโโ dashboard.html # Main dashboard
โ โโโ survey_analytics.html # Survey analytics dashboard
โ โโโ survey_dashboard.html # Survey overview
โ โโโ survey_responses.html # Response activity monitor
โ โโโ sync_dashboard.html # Auto-sync management
โ โโโ health_dashboard.html # Health check dashboard
โ โโโ spreadsheets.html # Spreadsheets listing
โ โโโ spreadsheet_detail.html # Individual spreadsheet view
โ โโโ jobs.html # Extraction jobs history
โ โโโ login.html # Authentication page
โ โโโ error.html # Error page
โ
โโโ hybrid_surveyor/ # Advanced CLI tool (optional)
โ โโโ src/ # Source code
โ โโโ tests/ # Test suite
โ โโโ docs/ # Additional documentation
โ
โโโ docs/ # Project documentation
โ โโโ PROGRESS.md # Development progress
โ โโโ work-logs/ # Work session logs
โ
โโโ tests/ # Test suite
โ โโโ unit/ # Unit tests
โ โโโ integration/ # Integration tests
โ
โโโ surveyor_data_improved.db # Raw spreadsheet data (SQLite)
โโโ survey_normalized.db # Normalized survey data (SQLite)
โโโ requirements.txt # Python dependencies
โโโ pyproject.toml # Project configuration
โโโ Makefile # Development commands
โโโ Procfile # Railway deployment config
โโโ railway.toml # Railway configuration
โโโ README.md # This file
๐ฏ Core Features
๐ Survey Analytics Dashboard (/surveys)
- Overview Statistics - Total surveys, responses, respondents, response rates
- Survey Breakdown - Performance by survey type and name
- Completion Statistics - Visual completion rates with progress bars
- Respondent Analysis - Browser, device, and response frequency patterns
- Beautiful Visualizations - Color-coded charts and progress indicators
๐ Detailed Analytics (/surveys/analytics)
- Question-Level Analysis - Response rates and answer distributions
- Statistical Insights - Numeric averages, boolean counts, unique answers
- Time Series Charts - Response trends over time
- Survey Filtering - Focus on specific surveys
- Export Capabilities - CSV download and API access
โฐ Response Activity Monitor (/surveys/responses)
- Timeline View - When and who responded with detailed logs
- Technology Analysis - Browser and device usage patterns
- Response Patterns - Frequency analysis and daily activity
- Real-time Updates - Auto-refresh for live monitoring
- Responsive Design - Works on all devices
๐ Auto-Sync Management (/sync)
- Intelligent Change Detection - Automatically finds new/updated data
- Service Management - Start/stop/configure sync service
- Real-time Monitoring - Live status and performance metrics
- Manual Triggers - Force immediate sync when needed
- Activity Logging - Detailed sync history and troubleshooting
๐ฅ Health Check System (/health/dashboard)
- API Key Validation - Verify Google Sheets API access
- Dependency Monitoring - Check external service availability
- End-to-End Tests - Validate complete data flow
- Configuration Validation - Ensure proper setup
- Continuous Monitoring - Background health checks
๐ Spreadsheets Management (/spreadsheets)
- Grid View - All imported spreadsheets
- Search and Filter - By title and type
- Type Categorization - Color-coded badges (Survey, Assessment, Inventory)
- Row Count - Last sync information
- Direct Links - To Google Sheets sources
โ๏ธ Job Monitoring (/jobs)
- Extraction Job History - Detailed progress tracking
- Success/Failure Rates - Error reporting
- Real-time Status Updates - For running jobs
- Job Duration - Performance metrics
๐จ User Interface
Design System
- Tailwind CSS for modern, responsive design
- Font Awesome icons for visual clarity
- Color-coded categories:
- ๐ต Survey - Blue theme
- ๐ข Assessment - Green theme
- ๐ฃ Inventory - Purple theme
- Mobile-first responsive design
Interactive Features
- Hover effects and smooth transitions
- Copy-to-clipboard functionality
- Modal dialogs for detailed views
- Auto-refresh for live data updates
๐๏ธ Database Architecture
Raw Data Database (surveyor_data_improved.db)
spreadsheets- Metadata about each Google Sheetraw_data- Actual spreadsheet data stored as JSONextraction_jobs- Job tracking and history
Normalized Survey Database (survey_normalized.db)
surveys- Survey metadata and configurationsurvey_questions- Normalized question definitionssurvey_responses- Individual response recordssurvey_answers- Detailed answer data with type parsingrespondents- Unique respondent trackingsync_tracking- Auto-sync history and statusnormalization_jobs- Process tracking and auditing
Key Features
- Relational Structure - Proper foreign key relationships
- Type Safety - Automatic type detection and parsing
- JSON Storage - Flexible data structure for raw data
- SHA256 Hashing - Deduplication and change detection
- Optimized Indexes - Fast queries on all search fields
- Data Integrity - Foreign key constraints throughout
๐๏ธ Architecture
MVC Pattern (v1.2.2+)
The application follows a Model-View-Controller architecture for clean separation of concerns:
- Model (
src/analytics/report_generator.py): All calculation logic and dimension scoring - View (
templates/): Pure presentation layer with no business logic - Controller (
app.py): Request handling, caching, and routing
Calculation Checksums (v1.2.3)
All dimension calculations include SHA-256 checksums for verification:
- Ensures calculation consistency across requests
- Enables result verification via API endpoints
- Provides audit trail for debugging and compliance
- Validates formula:
adjusted_score = max(0, min(5, base_score + total_modifier))
Caching Strategy (v1.2.6)
Multi-criteria cache validation ensures optimal performance while maintaining data freshness:
- Cache exists for organization
- Response count matches current data
- Cache age within TTL (configurable, default 5 minutes)
Cache invalidation triggers:
- New survey responses added
- Admin edits applied to reports
- TTL expiration (configurable via
REPORT_CACHE_TTL)
Performance impact:
- First request: ~120 seconds (with AI analysis)
- Cached requests: ~50ms (2,400x improvement)
- Cost reduction: Minimizes API calls for repeated report views
๐ Supported Google Sheets
The system currently supports 6 JJF Technology Assessment spreadsheets:
| Type | Count | Description |
|---|---|---|
| Survey | 2 | Survey questions and response collection |
| Assessment | 3 | Technology maturity assessments (CEO, Staff, Tech Lead) |
| Inventory | 1 | Software systems inventory |
๐ ๏ธ Development Setup
Requirements
- Python 3.8+ (Python 3.13 recommended)
- pip - Python package manager
- SQLite3 - Database (built into Python)
- Internet connection - For Google Sheets access
Installation
Option 1: Quick Install
# Install all dependencies
pip install -r requirements.txt
Option 2: Development Setup with Virtual Environment
# Create virtual environment
python -m venv venv
# Activate virtual environment
source venv/bin/activate # On Linux/Mac
# OR
venv\Scripts\activate # On Windows
# Install dependencies
pip install -r requirements.txt
Option 3: Using Make
# Set up development environment
make setup
# Activate virtual environment
source venv/bin/activate
# Install dependencies
make install
Initial Data Setup
# 1. Extract data from Google Sheets
python improved_extractor.py
# 2. Normalize survey data
python survey_normalizer.py --auto
# 3. (Optional) Initialize health checks
python healthcheck.py
Running the Application
# Start the web server
python app.py
# Access at http://localhost:8080
# No password required for local development
Environment Variables
Create a .env file in the project root:
# Application Configuration
PORT=8080
SECRET_KEY=your-secret-key-here
# Authentication (disabled by default for local development)
REQUIRE_AUTH=false # Set to 'true' for production
APP_PASSWORD=survey2025! # Only used when REQUIRE_AUTH=true
# Cache Configuration (v1.2.6+)
REPORT_CACHE_TTL=300 # Cache TTL in seconds (default: 300 = 5 minutes)
# Database (for Railway deployment)
DATABASE_URL=postgresql://... # Optional, uses SQLite if not set
# Logging
LOG_LEVEL=INFO
# Auto-Sync Configuration
AUTO_SYNC_INTERVAL=300 # seconds
Environment Variable Reference:
| Variable | Description | Default | Required |
|---|---|---|---|
PORT |
Web server port | 8080 |
No |
SECRET_KEY |
Flask session secret | Auto-generated | No |
REQUIRE_AUTH |
Enable password protection | false |
No |
APP_PASSWORD |
Login password | survey2025! |
No |
REPORT_CACHE_TTL |
Report cache TTL (seconds) | 300 |
No |
DATABASE_URL |
PostgreSQL URL (Railway) | SQLite fallback | No |
LOG_LEVEL |
Logging level | INFO |
No |
AUTO_SYNC_INTERVAL |
Auto-sync interval (seconds) | 300 |
No |
Version Management
The project includes a comprehensive Make-based version management system:
# Display current version information
make version
# Bump patch version (1.0.0 โ 1.0.1) - bug fixes
make version-patch
# Bump minor version (1.0.0 โ 1.1.0) - new features
make version-minor
# Bump major version (1.0.0 โ 2.0.0) - breaking changes
make version-major
# Update build metadata only (no version bump)
make version-build
Features:
- Semantic versioning (major.minor.patch)
- Automatic git metadata extraction (commit hash, branch name)
- Build date and build number tracking
- Version displayed at application startup
Workflow:
- Make changes to the codebase
- Run
make version-patch(or minor/major as appropriate) - Review changes:
git diff version.py - Commit:
git add version.py && git commit -m "chore: bump version" - Push:
git push origin main
๐ API Endpoints
Web Routes
GET /- Main dashboardGET /login- Authentication pageGET /logout- LogoutGET /spreadsheets- Spreadsheets listingGET /spreadsheet/<id>- Individual spreadsheet viewGET /jobs- Extraction jobs historyGET /surveys- Survey analytics dashboardGET /surveys/analytics- Detailed question analysisGET /surveys/responses- Response activity monitorGET /sync- Auto-sync management dashboardGET /health/dashboard- Health check dashboardGET /health/test- Run health checks
API Routes
Core API
GET /api/stats- Dashboard statistics (JSON)GET /api/spreadsheet/<id>/data- Spreadsheet data (JSON)GET /api/sync/status- Auto-sync service statusPOST /api/sync/start- Start auto-sync servicePOST /api/sync/stop- Stop auto-sync servicePOST /api/sync/force- Force immediate syncGET /api/survey/search- Search survey responsesGET /api/survey/<id>/export- Export survey data (CSV)GET /health/status- Health check status (JSON)POST /health/check- Run specific health checks
JSON Report API (v1.2.4+)
GET /api/reports/organizations- List all organizations with metadataGET /api/reports/organization/<name>/json- Full organization report with checksumsGET /api/reports/organization/<name>/verification- Verification data only (lightweight)GET /api/docs- Interactive API documentation with usage examples- Features: CORS enabled, calculation checksums included, cache metadata exposed
๐ Troubleshooting
Common Issues
-
Database not found
# Run the extractor first python improved_extractor.py # Then normalize the data python survey_normalizer.py --auto
-
Port already in use
# Change port via environment variable export PORT=8080 # Or kill existing process lsof -ti:5001 | xargs kill -9 # Mac/Linux
-
Google Sheets access denied
- Check if sheets are publicly accessible
- Verify URLs are correct in the extractor
- Check internet connection
- Review API key configuration
-
Authentication issues
# Disable authentication for testing export REQUIRE_AUTH=false # Or set custom password export APP_PASSWORD=your-password
-
Auto-sync not working
- Check sync dashboard at
/sync - Verify sync service is started
- Review logs for errors
- Ensure source data has changed
- Check sync dashboard at
-
Health checks failing
# Run health checks manually python healthcheck.py # Check specific components python healthcheck.py --api-only python healthcheck.py --deps-only
Debug Mode
The web application runs in debug mode by default in development:
- Auto-reload on code changes
- Detailed error messages in browser
- Interactive debugger for exceptions
- Verbose logging to console
To disable debug mode (production):
export RAILWAY_ENVIRONMENT=production
๐ Data Flow
โโโโโโโโโโโโโโโโโโโโโโโ
โ Google Sheets โ
โ (Source Data) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ improved_extractor โ
โ (Data Extraction) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ surveyor_data_ โ
โ improved.db โ
โ (Raw Data) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ survey_normalizer โ
โ (Normalization) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ survey_normalized โ
โ .db (Relational) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Flask Application โ
โ (Web Interface) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ User Browser โ
โ (Dashboards) โ
โโโโโโโโโโโโโโโโโโโโโโโ
๐ Deployment
Local Development
# Start the application
python app.py
# Access at http://localhost:8080
# No password required
Railway Deployment
-
Connect Repository
- Link your GitHub repository to Railway
- Railway will auto-detect the Python project
-
Configure Environment Variables
APP_PASSWORD=your-secure-password SECRET_KEY=your-secret-key REQUIRE_AUTH=true
-
Deploy
- Railway will automatically build and deploy
- Health checks at
/health/status - PostgreSQL database automatically provisioned
-
Access
- Your app will be available at
https://your-app.railway.app
- Your app will be available at
See RAILWAY_DEPLOYMENT.md for detailed deployment instructions.
๐ Additional Documentation
- FINAL_IMPLEMENTATION_SUMMARY.md - Complete feature overview
- RAILWAY_DEPLOYMENT.md - Railway deployment guide
- AUTHENTICATION_CONFIG.md - Authentication setup
- AUTO_SYNC_IMPLEMENTATION.md - Auto-sync details
- HEALTHCHECK_README.md - Health check system
- hybrid_surveyor/README.md - Advanced CLI tool
๐งช Testing
# Run all tests
make test
# Run with coverage
make test-cov
# Run specific test suite
python -m pytest tests/unit -v
python -m pytest tests/integration -v
# Run health checks
python healthcheck.py
๐ฏ Use Cases
Survey Analysis
- Analyze response patterns across multiple surveys
- Track completion rates and respondent engagement
- Identify trends in survey responses over time
- Export data for external analysis
Data Management
- Centralized view of all survey data
- Automatic synchronization with Google Sheets
- Historical tracking of data changes
- Audit trail for all operations
Monitoring
- Real-time health checks of system components
- API key validation and dependency monitoring
- Response activity tracking
- System performance metrics
๐ Current Status
Production Ready Features
- โ 22 survey responses processed across 5 surveys
- โ 240 questions normalized with proper typing
- โ 585 answers analyzed with statistical insights
- โ 13 unique respondents tracked
- โ Auto-sync service with intelligent change detection
- โ Health check system with comprehensive monitoring
- โ Authentication with password protection
- โ Railway deployment ready with PostgreSQL support
- โ Responsive design for all devices
- โ REST API for programmatic access
Supported Survey Types
| Type | Count | Description |
|---|---|---|
| Survey | 2 | Survey questions and response collection |
| Assessment | 3 | Technology maturity assessments (CEO, Staff, Tech Lead) |
| Inventory | 1 | Software systems inventory |
๐ Access Points
Local Development:
- Main Application: http://localhost:8080
- Survey Analytics: http://localhost:8080/surveys
- Auto-Sync Dashboard: http://localhost:8080/sync
- Health Dashboard: http://localhost:8080/health/dashboard
Authentication:
- Local: No password required (disabled by default)
- Production: Set
REQUIRE_AUTH=trueand configureAPP_PASSWORD
๐ Support
For issues, questions, or contributions:
- Check the troubleshooting section
- Review the additional documentation
- Run health checks:
python healthcheck.py - Check application logs for detailed error messages
Built with โค๏ธ using Flask, SQLite, and Tailwind CSS
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jjf_survey_analytics-1.5.3.tar.gz.
File metadata
- Download URL: jjf_survey_analytics-1.5.3.tar.gz
- Upload date:
- Size: 268.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
670ab16c5e89de0dd61ba9e179fd8d38f3f44559361ee511354096ccb584733d
|
|
| MD5 |
493893f06964955601add414ece94bb5
|
|
| BLAKE2b-256 |
965262230a9cd6000871cf16656ab77026090995c04bcf7dd83da1feb61aa862
|
File details
Details for the file jjf_survey_analytics-1.5.3-py3-none-any.whl.
File metadata
- Download URL: jjf_survey_analytics-1.5.3-py3-none-any.whl
- Upload date:
- Size: 252.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eea8a530341cc8de45d8498c846d8baf9b61142522723581e4ba75e381ce831c
|
|
| MD5 |
77c8de8423086ca8bfffb102fc074cc3
|
|
| BLAKE2b-256 |
9477620fedd0e50e668be8df0e3556c812cafd447f9b19eee943506dc5bbe79c
|