Skip to main content

No project description provided

Project description

VIEWS Twitter Header

VIEWS Pipeline Core Banner

GitHub License    GitHub branch check runs    GitHub Issues or Pull Requests    GitHub Release

A modular Python framework for end‑to‑end conflict forecasting: data ingestion, transformation, drift monitoring, model and ensemble management, evaluation, reconciliation, mapping, reporting, packaging, and artifact governance.

Acknowledgements

Views Funders


Table of Contents

  1. Conceptual Overview
  2. High‑Level Architecture
  3. Core Pipeline Stages
  4. Managers (Orchestration Layer)
  5. Modules (Functional Layer)
  6. Data Layer & Querysets
  7. Evaluation & Metrics
  8. Reconciliation (Hierarchical Consistency)
  9. Reporting & Mapping
  10. CLI & Argument System
  11. Configuration & Partitioning
  12. Package Management
  13. Logging & Monitoring
  14. Development Workflow
  15. Quick Start
  16. FAQ

1. Conceptual Overview

The pipeline transforms raw geo‑temporal data into validated, reconciled, and documented forecasts. Key features include:

  • Deterministic data preparation (queryset + transformation replay)
  • Strict naming & artifact conventions
  • Partition-aware evaluation (calibration/validation/forecasting)
  • Multi-model ensembling & hierarchical reconciliation
  • Automated HTML reporting and spatial visualization
  • Reproducible configuration merging and logging
  • Optional integration with Weights & Biases (WandB) and prediction store

2. High‑Level Architecture

         ┌────────────────────────────────────────┐
         │            ConfigurationManager        │
         │  (deployment + hyperparameters + meta) │
         └───────────────┬────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────┐
│                 ViewsDataLoader                  │
│  Queryset → Raw Fetch → Drift Check → Update     │
│  → Transformation Replay → Partition Slice       │
└───────────────┬──────────────────────────────────┘
                │  DataFrame (month_id, entity_id)
                ▼
      ┌─────────────────────────┐
      │     Model / Ensemble    │
      │  Training / Evaluation  │
      │  Forecasting / Reports  │
      └────────────┬────────────┘
                   │ Predictions
                   ▼
         ┌────────────────────────┐
         │ ReconciliationModule   │
         │ (Country ↔ Priogrid)   │
         └────────────┬───────────┘
                      │ Reconciled Predictions
                      ▼
         ┌───────────────────────────┐
         │ Reporting & Mapping       │
         │ HTML, Tables, Choropleths │
         └───────────────────────────┘

3. Core Pipeline Stages

Stage Output Key Component
Data Fetch Partitioned feature/target frame ViewsDataLoader
Train Artifact (model file) ForecastingModelManager / EnsembleManager
Evaluate Metrics + eval predictions Evaluation logic
Forecast Future horizon predictions ForecastingModelManager
Reconcile Grid ↔ country consistency ReconciliationModule
Report HTML summaries ReportModule + MappingModule
Package Poetry-compliant project PackageManager

4. Managers (Orchestration Layer)

Manager Purpose
ModelPathManager Path + artifact resolution for a model
ModelManager Abstract training/evaluation/forecast flow control
ForecastingModelManager Concrete forecasting implementation scaffold
EnsemblePathManager Paths for multi-model ensemble
EnsembleManager Aggregation + optional reconciliation
ExtractorPathManager External raw data ingestion paths
ExtractorManager Download → preprocess → save for external datasets
PostprocessorPathManager Downstream transformation stage paths
PostprocessorManager Read → transform → validate → save
PackageManager Create/validate Poetry packages
ConfigurationManager Merge + validate layered configuration

Each manager has accompanying documentation in its module directory.


5. Modules (Functional Layer)

Module Role
dataloaders Partition-aware data retrieval + drift detection + incremental update
transformations Dataset transformation undo/management
reconciliation Hierarchical grid ↔ country alignment
reports Tailwind-styled HTML evaluation/forecast report generation
mapping Static + interactive choropleth maps (matplotlib / Plotly)
logging Central logging configuration injection
statistics Forecast reconciliation math (proportional scaling)
wandb Alerts, artifact logging, run lifecycle
model validation Structural & logical integrity checks
ensemble validation Structural & logical integrity checks

5.1 Intermediate Modules

Module Role
cli CLI parsing and validation
dataset Spatio-temporal dataset handler with country and priogrid level support

6. Data Layer & Querysets

  • Querysets define feature/target extraction logic + transformation chains.
  • Incremental updates replace raw slices (GED / ACLED) and replay transformations (UpdateViewser).
  • MultiIndex structure: (month_id, entity_id) for time-spatial operations.
  • Data types normalized (float64 for numeric integrity).
  • Partitions defined via month ranges (train/test or forecast horizon).

7. Evaluation & Metrics

Evaluation produces:

  • Step-wise metrics (per forecast horizon)
  • Month-wise metrics (temporal slices)
  • Time-series metrics (sequence performance trajectory)

Conflict type auto-inferred from target tokens (sb / ns / os). Files named per ADR conventions (artifact/output naming).


8. Reconciliation (Hierarchical Consistency)

Ensures priogrid sums align with authoritative country totals while preserving relative spatial pattern and zero inflation. Parallelizable across countries × time × targets. Integrated into ensembles or model forecast postprocessing.


9. Reporting & Mapping

Component Feature
ReportModule Headings, paragraphs, Markdown, tables, images, grids
MappingModule Country & priogrid choropleths (static + interactive animation)
Templates Forecast + evaluation report skeletons
CSS Tailwind subset embedded for portability

Reports embed:

  • Metrics tables
  • Key–value configuration summaries
  • Spatial animations (Plotly)
  • Artifact provenance (timestamps, versions)

10. CLI & Argument System

Dataclass-driven (ForecastingModelArgs):

  • Flags: --train, --evaluate, --forecast, --report, --sweep, --prediction_store, --monthly
  • Validation prevents illegal combinations (e.g., evaluate with forecasting run type).
  • Monthly shortcut auto-configures production cycle.

11. Configuration & Partitioning

ConfigurationManager merges:

  1. Deployment
  2. Hyperparameters
  3. Meta
  4. Partition dictionary
  5. Runtime overrides (highest priority)

Forecast partitions dynamically adjusted by override_timestep. Validation enforces structural integrity and target specification.


12. Package Management

PackageManager:

  • Validates naming (organization-prefix-*)
  • Creates Poetry skeleton (Python version constraint)
  • Adds dependencies (including views-pipeline-core)
  • Fetches latest release (tags or GitHub API)
  • Runs poetry check

13. Logging & Monitoring

  • YAML-driven configuration (handlers, levels, formatters).
  • Dedicated model/ensemble logging directories.
  • Standard separation: main log, error log.
  • WandB alerts for stage transitions, failures, reconciliation completeness.

14. Development Workflow

Task Command
Run model ./run.sh --run_type calibration --train --evaluate --report --saved
Run ensemble ./run.sh --ensemble hybrid_lynx --forecast --report
Update raw data Use --update_viewser
Generate report only Use --evaluate --report or --forecast --report

Refer to documentation/development_guidelines.md for coding standards and docstring_guidelines.md for formatting.


15. Quick Start

  1. Run build_model_scaffold.py or build_ensemble_scaffold.py found in the views-models repository.

  2. Update config_deployment.py, config_hyperparameters.py, config_queryset.py, config_meta.py.

  3. Run calibration:

    python main.py --run_type calibration --train --evaluate --report
    
  4. Run forecasting:

    python main.py --run_type forecasting --train --forecast --report
    
  5. View artifacts: models/<name>/artifacts/


16. FAQ

Question Answer
Do I need WandB? Optional; disable notifications to run offline.
Can I reconcile single-model forecasts? Yes—apply ReconciliationModule manually after forecast stage.
How do I add a new transformation? Register callable in transformation mapping and ensure replay compatibility.
Are forecasts stored transformed or raw? Temporarily reversed to raw scale before saving (pending ADR finalization).
Can I aggregate probabilistic outputs? Current ensemble aggregation expects scalar or single-element lists.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

views_pipeline_core-2.2.0.tar.gz (7.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

views_pipeline_core-2.2.0-py3-none-any.whl (7.3 MB view details)

Uploaded Python 3

File details

Details for the file views_pipeline_core-2.2.0.tar.gz.

File metadata

  • Download URL: views_pipeline_core-2.2.0.tar.gz
  • Upload date:
  • Size: 7.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.14 Linux/6.11.0-1018-azure

File hashes

Hashes for views_pipeline_core-2.2.0.tar.gz
Algorithm Hash digest
SHA256 30ae98943e5eae11c01974cd2fe8546f1fc513a81b3eec6205c0e06ab16ac049
MD5 6a99b1ec07982573fee6024cf0a9c707
BLAKE2b-256 6a0dadf2e76fabd34f72bfa0e461637104c89ef49e3bdc9ca696d8d420b8313a

See more details on using hashes here.

File details

Details for the file views_pipeline_core-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: views_pipeline_core-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.14 Linux/6.11.0-1018-azure

File hashes

Hashes for views_pipeline_core-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35509bc5c35e4dc3768b9d5be94845c59544cc3f0a71be3806f278ae469804d2
MD5 f09a2595410d2fa624d867e8a25476b7
BLAKE2b-256 da285969a2c13a25e052d9ddc5853a1efaa944d3f41129df2924d1a8bfa5e332

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page