A template for CausalIQ repos
Project description
causaliq-data
This package provides data handling, statistical testing, and scoring infrastructure for causal discovery and Bayesian network operations.
Installation
Install from PyPI:
pip install causaliq-data
Status
🚧 Active Development - This repository is currently in active development, which involves:
- migrating functionality from the legacy monolithic discovery repo
- restructuring classes to reduce module size and improve maintainability and improve usability
- ensure CausalIQ development standards are met
Features
Currently implemented:
- Release v0.1.0 - Foundation Data: CausalIQ compliant Data provider interface and concrete implementations with data store internally as pandas Dataframes or Numpy 2D arrays.
Planned releases (supporting legacy functionality):
- Release v0.2.0 - Score: Support for BIC and BDeu score functions
- Release v0.3.0 - CI Tests: Conditional Independence
Upcoming Key Innovations
🧩 Plugin Architecture
- use by third-party software - ability to use these data capabilities in third party structure learning algorithms so that comparisons are based on a common scoring or conditional independence framework, and performance optimisations speed up third-party algorithms.
🏛️ Stability Integration
- Stable scores - stable resolution of equal-score situations for unstable algorithms e.g. Tabu
🧠 LLM-assisted Causal Discovery
- Data values - Data values and variable names may provide part of the context for LLM-assisted causal discovery
- Knowledge integration - incorporation of LLM and human expertise in scores and priors via the CausalIQ Knowledge package.
- Relationship explanations: Natural language descriptions of relationships in data
⚡Optimised Performance
- GPU Data provider - support for optimised data handling on GPU hardware
- Intelligent data scanning - reduce number of full-row data scans
🎲 Enhanced Distribution Support
- Mixed Types: scores and independence tests that support mixtures of continuous and categorical variables
Integration with CausalIQ Ecosystem
- 🔍 CausalIQ Discovery makes use of this package to provide objective functions and conditional independence tests for structure learning algorithms.
- 🧪 CausalIQ Analysis uses score functions as part of the evaluation of learnt graphs.
- 💎 CausalIQ Core makes use of the BNFit interface to estimate parameters based on data.
- 🤖 CausalIQ Workflow uses the in-memory randomisation of this package for stability experiments.
LLM Support
The following provides project-specific context for this repo which should be provided after the personal and ecosystem context:
I wish to migrate the code in legacy/code/data following all CausalIQ development guidelines
so that the legacy repo can use the migrated code instead. I also want my legacy Bayesian Network
code to be able to use the BNFit interface (see bnfit_interface_spec.md). I would start by migrating
the Data abstract class and pandas.py. Please do this a little at a time and advise me what you intend
to do before making any changes.
Quick Start
# To be completed - example will score a known graph
Getting started
Prerequisites
- Git
- Latest stable versions of Python 3.9, 3.10. 3.11 and 3.12
Clone the new repo locally and check that it works
Clone the causaliq-core repo locally as normal
git clone https://github.com/causaliq/causaliq-data.git
Set up the Python virtual environments and activate the default Python virtual environment. You may see messages from VSCode (if you are using it as your IDE) that new Python environments are being created as the scripts/setup-env runs - these messages can be safely ignored at this stage.
scripts/setup-env -Install
scripts/activate
Check that the causaliq-core CLI is working, check that all CI tests pass, and start up the local mkdocs webserver. There should be no errors reported in any of these.
causaliq-data --help
scripts/check_ci
mkdocs serve
Enter http://127.0.0.1:8000/ in a browser and check that the causaliq-data documentation is visible.
If all of the above works, this confirms that the code is working successfully on your system.
Documentation
Full API documentation is available at: http://127.0.0.1:8000/ (when running mkdocs serve)
Contributing
This repository is part of the CausalIQ ecosystem. For development setup:
- Clone the repository
- Run
scripts/setup-env -Installto set up environments - Run
scripts/check_cito verify all tests pass - Start documentation server with
mkdocs serve
Supported Python Versions: 3.9, 3.10, 3.11, 3.12
Default Python Version: 3.11
License: MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file causaliq_data-0.1.0.tar.gz.
File metadata
- Download URL: causaliq_data-0.1.0.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae0448eeb6430a0ba8c738a0691bc55e3bce18da77575d91a20c20c583add1d3
|
|
| MD5 |
0f726a42c2bac0856d57b8b3d5d0fbba
|
|
| BLAKE2b-256 |
583077edbbe171cd4f15cca721a65e33a808b2a9a6886235ef9502062edee7b7
|
File details
Details for the file causaliq_data-0.1.0-py3-none-any.whl.
File metadata
- Download URL: causaliq_data-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6740ea6e748d3824b2594f1133089c1872eedeb539764bb3f68f7d1a2ad9c8e5
|
|
| MD5 |
ac716a6aec4d78463c84b376ca5da289
|
|
| BLAKE2b-256 |
8062e5e6660d0562770a6fbe38c2b322320e061940ba0e6e410ebe397b2d0069
|