Skip to main content

A template for CausalIQ repos

Project description

causaliq-data

Python Support License: MIT

This package provides data handling, statistical testing, and scoring infrastructure for causal discovery and Bayesian network operations.

Installation

Install from PyPI:

pip install causaliq-data

Status

🚧 Active Development - This repository is currently in active development, which involves:

  • migrating functionality from the legacy monolithic discovery repo
  • restructuring classes to reduce module size and improve maintainability and improve usability
  • ensure CausalIQ development standards are met

Features

Currently implemented:

  • Release v0.1.0 - Foundation Data: CausalIQ compliant Data provider interface and concrete implementations with data store internally as pandas Dataframes or Numpy 2D arrays.

Planned releases (supporting legacy functionality):

  • Release v0.2.0 - Score: Support for BIC and BDeu score functions
  • Release v0.3.0 - CI Tests: Conditional Independence

Upcoming Key Innovations

🧩 Plugin Architecture

  • use by third-party software - ability to use these data capabilities in third party structure learning algorithms so that comparisons are based on a common scoring or conditional independence framework, and performance optimisations speed up third-party algorithms.

🏛️ Stability Integration

  • Stable scores - stable resolution of equal-score situations for unstable algorithms e.g. Tabu

🧠 LLM-assisted Causal Discovery

  • Data values - Data values and variable names may provide part of the context for LLM-assisted causal discovery
  • Knowledge integration - incorporation of LLM and human expertise in scores and priors via the CausalIQ Knowledge package.
  • Relationship explanations: Natural language descriptions of relationships in data

⚡Optimised Performance

  • GPU Data provider - support for optimised data handling on GPU hardware
  • Intelligent data scanning - reduce number of full-row data scans

🎲 Enhanced Distribution Support

  • Mixed Types: scores and independence tests that support mixtures of continuous and categorical variables

Integration with CausalIQ Ecosystem

  • 🔍 CausalIQ Discovery makes use of this package to provide objective functions and conditional independence tests for structure learning algorithms.
  • 🧪 CausalIQ Analysis uses score functions as part of the evaluation of learnt graphs.
  • 💎 CausalIQ Core makes use of the BNFit interface to estimate parameters based on data.
  • 🤖 CausalIQ Workflow uses the in-memory randomisation of this package for stability experiments.

LLM Support

The following provides project-specific context for this repo which should be provided after the personal and ecosystem context:

I wish to migrate the code in legacy/code/data following all CausalIQ development guidelines
so that the legacy repo can use the migrated code instead. I also want my legacy Bayesian Network
code to be able to use the BNFit interface (see bnfit_interface_spec.md). I would start by migrating
the Data abstract class and pandas.py. Please do this a little at a time and advise me what you intend
to do before making any changes.

Quick Start

# To be completed - example will score a known graph

Getting started

Prerequisites

  • Git
  • Latest stable versions of Python 3.9, 3.10. 3.11 and 3.12

Clone the new repo locally and check that it works

Clone the causaliq-core repo locally as normal

git clone https://github.com/causaliq/causaliq-data.git

Set up the Python virtual environments and activate the default Python virtual environment. You may see messages from VSCode (if you are using it as your IDE) that new Python environments are being created as the scripts/setup-env runs - these messages can be safely ignored at this stage.

scripts/setup-env -Install
scripts/activate

Check that the causaliq-core CLI is working, check that all CI tests pass, and start up the local mkdocs webserver. There should be no errors reported in any of these.

causaliq-data --help
scripts/check_ci
mkdocs serve

Enter http://127.0.0.1:8000/ in a browser and check that the causaliq-data documentation is visible.

If all of the above works, this confirms that the code is working successfully on your system.

Documentation

Full API documentation is available at: http://127.0.0.1:8000/ (when running mkdocs serve)

Contributing

This repository is part of the CausalIQ ecosystem. For development setup:

  1. Clone the repository
  2. Run scripts/setup-env -Install to set up environments
  3. Run scripts/check_ci to verify all tests pass
  4. Start documentation server with mkdocs serve

Supported Python Versions: 3.9, 3.10, 3.11, 3.12
Default Python Version: 3.11
License: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causaliq_data-0.1.0.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causaliq_data-0.1.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file causaliq_data-0.1.0.tar.gz.

File metadata

  • Download URL: causaliq_data-0.1.0.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for causaliq_data-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ae0448eeb6430a0ba8c738a0691bc55e3bce18da77575d91a20c20c583add1d3
MD5 0f726a42c2bac0856d57b8b3d5d0fbba
BLAKE2b-256 583077edbbe171cd4f15cca721a65e33a808b2a9a6886235ef9502062edee7b7

See more details on using hashes here.

File details

Details for the file causaliq_data-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: causaliq_data-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for causaliq_data-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6740ea6e748d3824b2594f1133089c1872eedeb539764bb3f68f7d1a2ad9c8e5
MD5 ac716a6aec4d78463c84b376ca5da289
BLAKE2b-256 8062e5e6660d0562770a6fbe38c2b322320e061940ba0e6e410ebe397b2d0069

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page