Skip to main content

A lightweight, automated testing system for data pipelines and tools

Project description

Locaria Integrated Testing Framework

A lightweight, automated testing system for data pipelines and tools. Focuses on business-logic validation, data quality checks, and operational sanity tests rather than UI or cosmetic testing.

Features

  • Business Logic Validation - Test time splits sum to 100%, financial ratios are within bounds, etc.
  • Data Quality Checks - Schema validation, null checks, row count sanity, data freshness
  • Configurable Thresholds - Firestore-based configuration for easy threshold updates
  • Integrated Logging - Sheet Logger integration for persistent test result storage
  • Email Alerts - Real-time failure notifications via existing email manager API
  • Pipeline-Specific Tests - Custom business logic validation for different data domains

Quick Start

Basic Usage

from modules.integrated_tests import create_testkit, SchemaTests, DataQualityTests, FreshnessTests

# Initialize testing framework
testkit = create_testkit("locate_2_pulls", "daily_updates")

# Initialize test classes
schema_tests = SchemaTests(testkit)
data_quality_tests = DataQualityTests(testkit)
freshness_tests = FreshnessTests(testkit)

try:
    # Your data pipeline code
    df = extract_data()
    
    # Stage 1: Intake tests
    schema_tests.check_required_columns(df, ["employee_id", "date", "hours"])
    schema_tests.check_data_types(df, {"employee_id": "object", "hours": "float64"})
    
    # Stage 2: Transform tests
    df_transformed = transform_data(df)
    # Row count checks moved to RowCountTests
    row_count_tests.check_row_count_change(df_transformed, "table_name", "append")
    
    # Stage 3: Load tests
    load_to_bq(df_transformed, table="finance.time_splits")
    freshness_tests.check_data_freshness(df_transformed, "timestamp")
    
finally:
    # Always finalize the test run
    testkit.finalize_run()

Environment Setup

The framework automatically integrates with the existing locate_2_pulls configuration store. No additional environment variables are required for basic functionality.

Optional Environment Variables

For advanced usage or when running outside the locate_2_pulls environment:

# Email API configuration (fallback)
export EMAIL_API_URL="https://your-app.appspot.com/api/tools/send_email_direct"

# Sheet Logger configuration (fallback)
export TEST_LOGS_SPREADSHEET_ID="your-google-sheets-id"
export GOOGLE_CREDENTIALS_PATH="/path/to/credentials.json"

Automatic Configuration

The framework automatically uses:

  • Firestore Project: locaria-dev-config-store (from config store)
  • Sheet Logger: Existing sheet logger instance from config store
  • Email API: Tool URL from config store (tool_URL + api/tools/send_email_direct)

Test Classes

SchemaTests

Schema validation tests for data quality assurance:

  • check_required_columns() - Validate required columns exist
  • check_data_types() - Validate column data types
  • check_null_constraints() - Check for nulls in critical fields
  • check_unique_constraints() - Validate unique key constraints
  • check_column_values() - Check values within expected ranges or sets
  • check_schema_completeness() - Comprehensive schema validation

DataQualityTests

Data quality tests for common validation scenarios:

  • check_row_count_change() - Row count change check with Firestore history (append/truncate)
  • check_numeric_ranges() - Values within expected ranges
  • check_duplicate_records() - Detect duplicate entries
  • check_data_completeness() - Data completeness above threshold
  • check_date_ranges() - Date values within reasonable bounds

FreshnessTests

Data freshness tests for ensuring data is up-to-date:

  • check_data_freshness() - Verify data is up-to-date
  • check_timestamp_progression() - Timestamps moving forward
  • check_data_consistency() - Data frequency and gap validation
  • check_data_age_distribution() - Data age distribution analysis

Configuration

Configuration is stored in Firestore in the locaria-dev-config-store project (automatically detected from the existing config store) under the integrated_testing_config collection.

Default Configuration

{
  "thresholds": {
    "row_count_change": {
      "warn_percentage": 20,
      "fail_percentage": 50
    },
    "out_of_office_percentage": {
      "warn_threshold": 25,
      "fail_threshold": 35
    },
    "time_split_tolerance": {
      "precision": 0.01
    },
    "data_freshness": {
      "max_age_hours": 24,
      "warn_age_hours": 12
    }
  },
  "test_switches": {
    "enable_schema_validation": true,
    "enable_business_logic_checks": true,
    "enable_freshness_checks": true,
    "enable_row_count_validation": true
  },
  "email_alerts": {
    "failure_recipients": ["data_team@locaria.com"],
    "warning_recipients": ["data_team@locaria.com"],
    "digest_frequency": "daily"
  }
}

Managing Configuration

from modules.integrated_tests import ConfigManager

# Initialize config manager
config_manager = ConfigManager()

# Create default configuration for a repository
config_manager.create_default_config_for_repository("locate_2_pulls")

# Update thresholds
config_manager.update_thresholds(
    "locate_2_pulls",
    "row_count_change",
    {"warn_percentage": 15, "fail_percentage": 40}
)

# Update test switches
config_manager.update_test_switches(
    "locate_2_pulls",
    {"enable_schema_validation": False}
)

Test Severity Levels

  • FAIL - Stops pipeline execution, logs error, sends immediate email alert
  • WARN - Continues pipeline execution, logs warning, sends digest email
  • PASS - Test passed, logs success

Email Templates

The framework uses pre-configured email templates in the email manager:

  • Test Failure Alert - Immediate notification for FAIL results
  • Test Warning Digest - Grouped notification for WARN results

Acknowledgment System

The acknowledgment system prevents email spam by allowing users to acknowledge known issues, which mutes them for a configurable period. Both warnings and failures can be acknowledged and stored in Firestore.

How It Works

  1. Issue Detection: Tests detect issues and log them with acknowledgment metadata
  2. Firestore Storage: Both acknowledgeable warnings and failures are stored in Firestore during finalize_run()
  3. Email Filtering: Email system checks acknowledgment status before sending
  4. User Acknowledgment: Users can acknowledge issues through web interface
  5. Mute Period: Acknowledged issues are muted for 7 days (configurable)
  6. Automatic Expiry: Mute periods expire automatically and issues are archived

Firestore Structure

Collection: pipeline_acknowledgments
└── Document: {repo}%{pipeline}%{test_name}
    └── Subcollection: issues
        └── Document: {issue_key_simple}
            - acknowledged: bool
            - muted_until: timestamp (UTC)
            - status: "WARN" or "FAIL"
            - identifier: str
            - details: str
            - issue_first_occurrence: timestamp (UTC)
            - issue_last_occurrence: timestamp (UTC)
            - issue_owner: str
            - acknowledged_by / acknowledged_at / acknowledgment_reason
    └── Subcollection: archives
        └── Document: {issue_key_simple}
            - Archived issues (expired mutes or manually deleted)

Web Interface

The acknowledgment system includes a modern web interface accessible at /tools/acknowledgment-manager in the Analytics Hub.

Features

  • Real-time Filtering: Filter by repository, pipeline, test type, and issue status (Warning/Failure)
  • Issue Management: Acknowledge/unacknowledge issues with configurable mute periods
  • Bulk Operations: Select multiple issues and acknowledge or archive them in batch
  • Issue Details: View comprehensive issue information including first/last occurrence and ownership
  • Archive Management: Expired mutes are automatically archived; manual deletions are also archived

Examples

See the examples/ directory for complete pipeline implementations:

  • sample_pipeline.py - Complete pipeline with integrated testing
  • config_store_integration_example.py - Demonstrates automatic config store integration

Run the examples:

cd modules/integrated_tests/examples
python sample_pipeline.py
python config_store_integration_example.py

Architecture

integrated_tests/
├── __init__.py                 # Main module exports
├── main/
│   └── testkit.py             # Core framework and orchestration
├── utils/
│   └── config_manager.py      # Firestore configuration management
├── generic_tests/
│   ├── __init__.py
│   ├── schema_tests.py        # Schema validation tests
│   ├── data_quality_tests.py  # Data quality tests
│   └── freshness_tests.py     # Data freshness tests
├── pipeline_specific_tests/   # Business logic tests per domain
│   └── __init__.py
├── examples/
│   └── sample_pipeline.py     # Usage examples
└── README.md

Best Practices

Test Design

  • Focus on business logic and data quality
  • Use descriptive test names that explain the business rule
  • Test at multiple stages: intake, transform, load, post-load
  • Include both positive and negative test cases

Error Handling

  • Always use try/finally blocks to ensure test finalization
  • Handle missing data gracefully
  • Provide meaningful error messages
  • Log sufficient context for debugging

Performance

  • Batch test operations when possible
  • Use efficient pandas operations
  • Avoid unnecessary data copies
  • Cache configuration when appropriate

Configuration

  • Use Firestore for dynamic configuration
  • Provide sensible defaults
  • Document all thresholds and switches
  • Version control configuration changes

Troubleshooting

Common Issues

  1. Sheet Logger Not Working

    • Check TEST_LOGS_SPREADSHEET_ID environment variable
    • Verify Google credentials path
    • Ensure spreadsheet exists and is accessible
  2. Email Alerts Not Sending

    • Check EMAIL_API_URL environment variable
    • Verify email templates are configured in email manager
    • Check network connectivity
  3. Firestore Configuration Issues

    • Verify locaria-dev-config-store project access
    • Check collection and document permissions
    • Ensure configuration document exists
  4. Test Failures

    • Check test thresholds in Firestore
    • Verify data quality and schema
    • Review test logic and business rules

Debug Mode

Enable debug logging by setting the log level in configuration:

config_manager.update_repository_config(
    "locate_2_pulls",
    {"logging": {"log_level": "DEBUG"}}
)

Contributing

When adding new tests:

  1. Follow the existing naming conventions
  2. Include comprehensive error handling
  3. Add configuration options for thresholds
  4. Update documentation
  5. Add examples for new functionality

Support

For questions or issues, contact the Data Team at data_team@locaria.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

locaria_integrated_testing-1.0.11.tar.gz (45.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

locaria_integrated_testing-1.0.11-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file locaria_integrated_testing-1.0.11.tar.gz.

File metadata

File hashes

Hashes for locaria_integrated_testing-1.0.11.tar.gz
Algorithm Hash digest
SHA256 f81b9a62ff3137df0ea8b6553f372150d29a241e8f5b7fd900d0ecdd38f47586
MD5 b1f4b9bfa907cecdf51455434bdb4a5d
BLAKE2b-256 3d61579c4c44dcf4da8fa71225711aa2fc3e04165bcd0a22a35d017107ab9d2e

See more details on using hashes here.

File details

Details for the file locaria_integrated_testing-1.0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for locaria_integrated_testing-1.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 71c2d11aef4c40fd001f20e7b4e96e565d1d07a84cf5f35bd8899eb5e5e4a36e
MD5 4bff37c440fe69595e5d9f1b18977af4
BLAKE2b-256 fcd6b1348cab13b917283708cec0b9386d18836b228bbe5ec80346681df66170

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page