Package for causal, scalable forecasting
Project description
divina: scalable and hyper-interpretable causal forecasting toolkit
What is it?
divina is a Python package that provides scalable, interpretable and performant forecasting capabilities designed to make causal forecasting modular, efficient and simple. It aims to reduce the challenge of causal forecasting on datasets of any size to configuration via JSON as opposed to construction and consumption of Python objects. At its core, divina aims to reduce the complexity and increase the consistency of performant causal forecasting at scale.
Main Features
Here are just a few of the things that divina does well:
- Abstraction of all necessary configuration of an experiment, from feature selection and engineering to target transformations and confidence intervals, is abstracted to a single JSON file for ease of consumption and ease of transparency.
- A user-centric, two-way interpretation interface that allows for granular interpretation of models and predictions while also allowing domain experts to override factors.
- Abstracted and scalable feature engineering. Encoding, interaction, normalization, binning and joining of datasets are handled scalably by the Dask back-end with minimal configuration required by the user.
- Simulation of user-defined factors in support of forward-looking, multi-signal and decision-enabling causal forecasts.
- Automatic persistence of all experiment artifacts, including models, predictions and validation metrics, to s3 for posterity, traceability and easy integration.
Roadmap
Current development priorities and improvements slated for next and beta release are:
- Addition of automated experiment summaries as persisted artifacts enabling ease of consumption and increased transparency into the forecasts and models divina produces.
- Improvement of the core model's performance, with the addition of attention mechanisms and the ability to adapt to signals with dynamic mean and variance.
- Addition of more realistic test cases, useful error messages and robust documentation.
- Cleanup of various pieces of the codebase and addition of convenience features such as filepath validation, signal filtering and a maximum lifespan for all EC2 instances divina creates.
Where to get it
The source code is currently hosted on GitHub at: https://github.com/secrettoad/divina
Binary installers for the latest released version are available at the Python Package Index (PyPI)
pip install divina
Dependencies
- dask - Adds support for arbitrarily large datasets via remote, parallelized compute
- dask-ml - Provides distributed-optimized implementations of many popular models
- s3fs - Allows for easy and efficient access to S3
- pyarrow - Enabled persistence of datasets as storage and compute efficent parquet files
License
Documentation
divina
's documentation is available here.
Background
Work on divina
started at Coysu Consulting (a technology consulting firm) in 2020 and
has been under active development since then.
Getting Help
For usage questions, the best place to go to is StackOverflow.
Discussion and Development
Most development discussions take place on GitHub in this repo.
Contributing to divina
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
If you are simply looking to start working with the divina codebase, navigate to the GitHub "issues" tab and start looking through interesting issues.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for divina-2021.11.29-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d606bc7466f083f90203569e242f7bd8679b4b621ad9c91471b07e64166be57 |
|
MD5 | aba3be73b4b672b533d63e500b3b4ef3 |
|
BLAKE2b-256 | 92921e3e947ba6e942fc2c33a1350712d00f7820ab20b2c2e84bc1aee16554f3 |