Skip to main content

Workflow engine for causal discovery and inference

Project description

causaliq-workflow

Python Versions License: MIT Coverage

GitHub Actions-inspired workflow orchestration for causal discovery experiments within the CausalIQ ecosystem. Execute causal discovery workflows using familiar CI/CD patterns with conservative execution and comprehensive action framework.

Status

🚧 Active Development - This repository is currently in active development, which involves:

  • migrating functionality from the legacy monolithic discovery repo to support legacy experiments and analysis
  • ensure CausalIQ development standards are met
  • adding new features to provide a comprehensive, open, causal discovery workflow.

Features

Implemented Releases

  • Release v0.1.0 - Workflow Foundations: Plug-in actions, basic workflow and CLI support, 100% test coverage

  • Release v0.2.0 - Knowledge Workflows: Integrate with causaliq-knowledge generate_graph action and write results to workflow caches.

See Git commit history for detailed implementation progress

🛣️ Upcoming Releases

  • Release v0.3.0 - Analysis Workflows: Graph averaging and structural analysis workflows.
  • Release v0.4.0 - Enhanced Workflow: Dry and comparison runs, runtime estimation and processing summary
  • Release v0.5.0 - Discovery Workflows: Structure learning algorithms integrated

causaliq-core Integration

causaliq-workflow builds on causaliq-core for its action framework and caching infrastructure:

  • CausalIQActionProvider - Base class for all action providers
  • ActionInput/ActionResult - Type-safe action interfaces
  • ActionValidationError/ActionExecutionError - Exception handling
  • TokenCache/JsonCompressor - SQLite-based caching with JSON tokenisation

Brief Example Usage

Example Workflow Definition, experiment.yml:

description: "Causal Discovery Experiment"
id: "experiment-001"
workflow_cache: "results/{{id}}_cache.db"  # All results stored here

matrix:
  network: ["asia", "cancer"]
  algorithm: ["pc", "ges"]
  sample_size: ["100", "1K"]

steps:
  - name: "Structure Learning"
    uses: "causaliq-discovery"
    with:
      algorithm: "{{algorithm}}"
      sample_size: "{{sample_size}}"
      dataset: "data/{{network}}"
      # Results cached with key: {network, algorithm, sample_size}

Execute with modes:

cqflow experiment.yml --mode=dry-run    # Validate and preview (default)
cqflow experiment.yml --mode=run        # Execute (skip if outputs exist)
cqflow experiment.yml --mode=compare    # Re-execute and compare outputs

Note that cqflow is a short synonym for causaliq-workflow which can also be used.

Upcoming Key Innovations

🔄 Workflow Orchestration

  • Continuous Integration (CI) testing: Workflow specification syntax
  • Dask distributed computing: Scalable parallel processing
  • Dependency management: Automatic handling of data and processing dependencies
  • Error recovery: Robust handling of failures and restarts

📊 Experiment Management

  • Configuration management: YAML-based experiment specifications
  • Parameter sweeps: Systematic exploration of algorithm parameters
  • Version control: Git-based tracking of experiments and results
  • Reproducibility: Deterministic execution with seed management

Integration with CausalIQ Ecosystem

  • 🔍 CausalIQ Discovery is called by this package to perform structure learning.
  • 📊 CausalIQ Analysis is called by this package to perform results analysis and generate assets for research papers.
  • 🔮 CausalIQ Predict is called by this package to perform causal prediction.
  • 🔄 Zenodo Synchronisation is used by this package to download datasets and upload results.
  • 🧪 CausalIQ Papers are defined in terms of CausalIQ Workflows allowing the reproduction of experiments, results and published paper assets created by the CausalIQ ecosystem.

LLM Support

The following provides project-specific context for this repo which should be provided after the personal and ecosystem context:

tbc

Prerequisites

  • Python 3.9-3.13
  • Git
  • R with bnlearn (optional, for external integration)

Installation

git clone https://github.com/causaliq/causaliq-workflow.git
cd causaliq-workflow

# Set up development environment
scripts/setup-env.ps1 -Install
scripts/activate.ps1

Example workflows: docs/example_workflows.md

Research Context

Supporting research for May 2026 paper on LLM integration for intelligent model averaging. The CI workflow architecture enables sophisticated experimental designs while maintaining familiar syntax for the research community.

Migration target: Existing workflows from monolithic discovery repo by end 2026.

Quick Start

# to be completed

Getting started

Prerequisites

  • Git
  • Latest stable versions of Python 3.9, 3.10. 3.11 and 3.12

Clone the new repo locally and check that it works

Clone the causaliq-analysis repo locally as normal

git clone https://github.com/causaliq/causaliq-analysis.git

Set up the Python virtual environments and activate the default Python virtual environment. You may see messages from VSCode (if you are using it as your IDE) that new Python environments are being created as the scripts/setup-env runs - these messages can be safely ignored at this stage.

scripts/setup-env -Install
scripts/activate

Check that the causaliq-analysis CLI is working, check that all CI tests pass, and start up the local mkdocs webserver. There should be no errors reported in any of these.

causaliq-analysis --help
scripts/check_ci
mkdocs serve

Enter http://127.0.0.1:8000/ in a browser and check that the causaliq-data documentation is visible.

If all of the above works, this confirms that the code is working successfully on your system.

Documentation

Full API documentation is available at: http://127.0.0.1:8000/ (when running mkdocs serve)

Contributing

This repository is part of the CausalIQ ecosystem. For development setup:

  1. Clone the repository
  2. Run scripts/setup-env -Install to set up environments
  3. Run scripts/check_ci to verify all tests pass
  4. Start documentation server with mkdocs serve

Supported Python Versions: 3.9, 3.10, 3.11, 3.12, 3.13 Default Python Version: 3.11
License: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causaliq_workflow-0.2.0.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causaliq_workflow-0.2.0-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file causaliq_workflow-0.2.0.tar.gz.

File metadata

  • Download URL: causaliq_workflow-0.2.0.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for causaliq_workflow-0.2.0.tar.gz
Algorithm Hash digest
SHA256 62df1acdbaf54a1242b6c5c630c83dc5c4a6c2c7807eef11cc7a11bfc793bd8f
MD5 d6665321c94838edd90c275963b2175c
BLAKE2b-256 a3e729b92e3b8fc7c4fc159a10c3cba9687b02492db4e5e118f2ae3cb8d14cae

See more details on using hashes here.

File details

Details for the file causaliq_workflow-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for causaliq_workflow-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02d5002a6b563307cc6bcf299a3e5a966a1c6dd69c56527489a7fcc2f563c48b
MD5 f4cdcf12e2c8f9719019bd5a0656548a
BLAKE2b-256 1b792696f1b549488359516d1b49d4cab84fd41c0e523b367ea7d17df77a97a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page