fifeforspark

Finite-Interval Forecasting Engine for Spark: Machine learning models for discrete-time survival analysis and multivariate time series forecasting for Apache Spark

These details have not been verified by PyPI

Project links

Homepage

Project description

The Finite-Interval Forecasting Engine for Spark (FIFEforSpark) is an adaptation of the Finite-Interval Forecasting Engine for the Apache Spark environment. Currently, it provides machine learning models (specifically a gradient boosted tree model) for discrete-time survival analysis.

If you are already familiar with FIFE, you'll recognize the following explanation of how FIFEforSpark approaches survival analysis. Many of the sections were borrowed heavily from FIFE as this is merely an adaptation of the package to the Spark environment with the exact same methodology. If you would like more information on FIFE, you can read the documentation here. If you want more documentation on FIFEforSpark, you can go here

Suppose you have a dataset that looks like this:

ID	period	feature_1	feature_2	feature_3	...
0	2016	7.2	A	2AX	...
0	2017	6.4	A	2AX	...
0	2018	6.6	A	1FX	...
0	2019	7.1	A	1FX	...
1	2016	5.3	B	1RM	...
1	2017	5.4	B	1RM	...
2	2017	6.7	A	1FX	...
2	2018	6.9	A	1RM	...
2	2019	6.9	A	1FX	...
3	2017	4.3	B	2AX	...
3	2018	4.1	B	2AX	...
4	2019	7.4	B	1RM	...
...	...	...	...	...	...

The entities with IDs 0, 2, and 4 are observed in the dataset in 2019.

While FIFE offers a significantly larger suite of models designed to answer a variety of questions, FIFEforSpark is mainly focused on one question: what are each individual's probabilities of being observed in any future year? Fortunately, FIFEforSpark can estimate answers to these questions for any unbalanced panel dataset.

Exactly like FIFE, FIFEforSpark unifies survival analysis and multivariate time series analysis. Tools for the former neglect future states of survival; tools for the latter neglect the possibility of discontinuation. Traditional forecasting approaches for each, such as proportional hazards and vector autoregression (VAR), respectively, impose restrictive functional forms that limit forecasting performance. FIFEforSpark supports one of the best approaches for maximizing forecasting performance: gradient-boosted trees (using MMLSpark's LightGBM).

FIFEforSpark is simple to use and the syntax is almost identical to that of FIFE; however, given that this is meant to be run in the Spark environment in a Python notebook, there are some notable differences. First, the package 'mmlspark' must already be installed and attached to the cluster. Unfortunately, the PyPI version of MMLSpark is not compatible with FIFEforSpark. As such, FIFE is best utilized in a Databricks notebook. For a tutorial on how to download mmlspark on databricks, click here.

FIFEforSpark is a supported package on PyPI (Python Package Index), thus downloading FIFEforSpark is as simple as entering the package name in the 'Create Library' tab on Databricks (with Library Source set to PyPI) or by running the following statement in the command prompt:

pip install fifeforspark

Once installed, generating forecasts is simple. If you are working in a Databricks python notebook, you may run something like the following code, where 'your_table' is the name of your table.

from fifeforspark.processors import PanelDataProcessor
from fifeforspark.lgb_modelers import LGBSurvivalModeler

data_processor = PanelDataProcessor(data = spark.sql("select * from your_table"))
data_processor.build_processed_data()

modeler = LGBSurvivalModeler(data=data_processor.data)
modeler.build_model()

forecasts = modeler.forecast()

If you are working in a Python IDE and have both pyspark and MMLSpark successfully installed, you can run the following:

import findspark
findspark.init()
import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession

from fifeforspark.processors import PanelDataProcessor
from fifeforspark.lgb_modelers import LGBSurvivalModeler

spark = SparkSession.builder.getOrCreate()
data_processor = PanelDataProcessor(data=spark.read.csv(path_to_your_data))
data_processor.build_processed_data()

modeler = LGBSurvivalModeler(data=data_processor.data)
modeler.build_model()

forecasts = modeler.forecast()

Here's a notebook with real data, where we forecast when world leaders will lose power: REIGN Example Notebook

If you would like more information on FIFEforSpark, you can read the documentation here: FIFEforSpark Documentation

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.2

Apr 1, 2022

0.0.1

Sep 25, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fifeforspark-0.0.2.tar.gz (709.5 kB view details)

Uploaded Apr 1, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fifeforspark-0.0.2-py3-none-any.whl (35.5 kB view details)

Uploaded Apr 1, 2022 Python 3

File details

Details for the file fifeforspark-0.0.2.tar.gz.

File metadata

Download URL: fifeforspark-0.0.2.tar.gz
Upload date: Apr 1, 2022
Size: 709.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.6

File hashes

Hashes for fifeforspark-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`3dba11f1c1d52e4a628aeb6e11dae8c192186763e761a13ebf284555ff1e1f59`
MD5	`c9dc9a110980198386c5b65a6a0ec2dd`
BLAKE2b-256	`dd55b44d0d3bc37ee18165a066e3a3cc84de50072d6d095fc130fc4357e1cefa`

See more details on using hashes here.

File details

Details for the file fifeforspark-0.0.2-py3-none-any.whl.

File metadata

Download URL: fifeforspark-0.0.2-py3-none-any.whl
Upload date: Apr 1, 2022
Size: 35.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.6

File hashes

Hashes for fifeforspark-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a754715d11d6e381534abce416e3b7f9273094bb732b7e51d5874d622cd8e33`
MD5	`916b7c3a1a752cb5beb1e3721fa3c9ce`
BLAKE2b-256	`cf7dc7937b4c8ad107c16a63303ae05d1b19a9d1e3594c3018e0b59feefc1dce`

See more details on using hashes here.

fifeforspark 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes