Package for causal, scalable forecasting
Project description
divina: scalable and hyper-interpretable causal forecasting toolkit
What is it?
divina
is essentially a convenience wrapper that facilitates training, prediction, validation and deployment of an ensemble consisting of a causal, interpretable model that is boosted by an endogenous time-series model, allowing for high levels of automation and accuracy while still emphasizing and relying on the causal relationships discovered by the user. This ensemble structure is delivered with swappable model types to be able to suit many different kinds of forecasting problems. divina
is also fully integrated with both dask
and prefect
meaning that distributed compute and pipeline orchestration can be enabled with the flip of a switch. For more information of divina
's features, check out the documentation.
Main Features
Here are just a few of the things that divina does well:
- Abstraction of all necessary configuration of a pipeline, from feature selection and engineering to target transformations and confidence intervals, is abstracted to a single python Pipeline object that follows the scikit interface for ease of consumption and ease of transparency.
- A user-centric, two-way interpretation interface that allows for granular interpretation of models and predictions while also allowing domain experts to override factors. (In progress)
- Abstracted and scalable feature engineering. Computation is handled scalably by the Dask back-end with minimal configuration required by the user and on the cloud provider of the user's choice by leveraging Dask Cloud Provider
- Built-in pipeline orchestration tools, such as log collection, task graph synthesis, task parallelization, task automation and artifact tracing leveraging Prefect
- Automatic persistence of all experiment artifacts, including models, predictions and validation metrics, to s3 for posterity, traceability and easy integration.
Roadmap
Current development priorities and improvements slated for next and beta release are:
- Addition of interpretability and interference application that makes consuming, understanding and interacting with forecasts easy and seamless
- Additional boosting options, such as RNNs, LSTMs, ARIMA, SARIMA, etc.
- Addition of more realistic test cases, useful error messages and robust documentation
- Inversion of control of Dask cluster creation, allowing for customization of location and size of cloud compute clusters
Where to get it
The source code is currently hosted on GitHub at: https://github.com/secrettoad/divina
Documentation
divina
's documentation is available here.
Binary installers for the latest released version are available at the Python Package Index (PyPI)
pip install divina
Dependencies
- dask - Adds support for arbitrarily large datasets via remote, parallelized compute
- dask-ml - Provides distributed-optimized implementations of many popular models
- s3fs - Allows for easy and efficient access to S3
- pyarrow - Enables persistence of datasets as storage and compute efficent parquet files
- prefect - Enables task orchestration, tracking and persistence
Testing
For local integration testing, run the following commands in order to create the necessary Prefect and Min.io containers.
docker pull jhurdle/divina-storage
docker pull jhurdle/divina-prefect
docker run jhurdle/divina-storage -p 9000:9000
docker run jhurdle/divina-prefect -p 4200:4200
pytest divina/divina/tests
License
Background
Work on divina
started at Coysu Consulting (a technology consulting firm) in 2020 and
has been under active development since then.
Getting Help
For usage questions, the best place to go to is StackOverflow.
Discussion and Development
Most development discussions take place on GitHub in this repo.
Contributing to divina
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
If you are simply looking to start working with the divina codebase, navigate to the GitHub "issues" tab and start looking through interesting issues.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for divina-2023.2.7.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7498068d4efb2020b41051903da983397d6f2058fa2bbb64d6b71ef9fccc9da4 |
|
MD5 | 2b673d439c8ee8b5ede721b400dcae5f |
|
BLAKE2b-256 | 5cfbd60bb5e92b0884d3eef8050bd95b9d2ff0a83a6bb711cfe674a301e424c6 |