Facilitating machine learning in the Norwegian Avalanche Forecasting Service

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

avaml - helper functions for Avalanche machine learning

This package contains functions used to prepare data from the Norwegian Avalanche Forecasting Service to facilitate machine learning.

Installation

To install using pip:

pip install avaml

Example program

Searching for data

Here is a short example program using the package to prepare data for training a RandomForestClassifier:

import sys

from regobslib import SnowRegion
import avaml
import datetime as dt
import pandas as pd
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier

DAYS_IN_TIMELINE = 4
START_DATE = dt.date(2017, 11, 1)
STOP_DATE = dt.date(2021, 7, 1)
TRAINING_REGIONS = [
    SnowRegion.VEST_FINNMARK,
    SnowRegion.NORD_TROMS,
    SnowRegion.LYNGEN,
    SnowRegion.SOR_TROMS,
    SnowRegion.INDRE_TROMS,
    SnowRegion.LOFOTEN_OG_VESTERALEN,
    SnowRegion.SVARTISEN,
    SnowRegion.HELGELAND,
    SnowRegion.TROLLHEIMEN,
    SnowRegion.ROMSDAL,
    SnowRegion.SUNNMORE,
    SnowRegion.INDRE_FJORDANE,
    SnowRegion.JOTUNHEIMEN,
    SnowRegion.HALLINGDAL,
    SnowRegion.INDRE_SOGN,
    SnowRegion.VOSS,
    SnowRegion.HEIANE,
]
VALIDATION_REGIONS = [
    SnowRegion.TROMSO,
    SnowRegion.SALTEN,
    SnowRegion.VEST_TELEMARK,
]
TEST_REGIONS = [
    SnowRegion.FINNMARKSKYSTEN,
    SnowRegion.OFOTEN,
    SnowRegion.HARDANGER,
]
REGIONS = sorted(TRAINING_REGIONS + VALIDATION_REGIONS + TEST_REGIONS)

# fetch_and_prepare_aps_varsom() does a number of things:
# * See if a call with the same parameters have been called earlier.
#   If so, load prepared csv's from disk.
# * If there is no prepared data on disk, see if there are raw
#   data instead.
# * If no previously downloaded data is found, download data from
#   APS and Varsom.
# * Save the data as csv's to the 'cache_dir'
# * Prepare the data:
#   * It transforms the Varsom DataFrame to contain the same index
#     as the APS DataFrame, replicating the forecast for every
#     elevation level. It then removes avalanche problems that
#     does not exist at the row's elevation level and remove the
#     elevation data from the avalanche problems. All rows that
#     does not contain complete APS data is removed.
#   * Creates a timeline out of the APS DataFrame. This means that
#     the APS data is concatenated onto a shifted version of itself,
#     making each APS row contain several days worth of data.
#     This will make some rows incomplete. They will be removed.
aps, varsom = avaml.prepare.fetch_and_prepare_aps_varsom(START_DATE,
                                                         STOP_DATE,
                                                         DAYS_IN_TIMELINE,
                                                         REGIONS,
                                                         print_to=sys.stdout,  # Print progression to terminal
                                                         cache_dir=".",  # Use current dir for csv-files
                                                         read_cache=True,
                                                         write_cache=True)

print("Training and predicting problems")
# split_avalanche_problems() takes the Varsom Dataframe and splits it into
# several DataFrames, containing information about one avalanche problem each.
# If a problem row contains some values, but also some NaNs, it is invalid and
# is removed from both Varsom and APS.
#
# A 3-tuple is returned, (problems_X: Dict, problems_Y: Dict, problems: List).
# * problems_X is a dict with the problem as key and a DataFrame with indata as value
# * problems_Y is a dict with the problem as key and a DataFrame with labels as value
# * problems is a list of avalanche problem names
problems_X, problems_Y, problems = avaml.prepare.split_avalanche_problems(aps, varsom)
f1_dict = {}
for problem in problems:
    X = problems_X[problem]
    Y = problems_Y[problem]

    # Splitting data into TRAINING and VALIDATION sets
    training_index = Y.index.isin(TRAINING_REGIONS, level="region")
    validation_index = Y.index.isin(VALIDATION_REGIONS, level="region")
    training_X = X.loc[training_index]
    training_Y = Y.loc[training_index].any(axis=1)
    validation_X = X.loc[validation_index]
    validation_Y = Y.loc[validation_index].any(axis=1)

    # Training and validating
    classifier = RandomForestClassifier(n_estimators=10)
    classifier.fit(training_X.values, training_Y)
    prediction = pd.Series(classifier.predict(validation_X.values), index=validation_X.index)

    # Calculating and storing scores to dict
    elevation_prediction = prediction
    elevation_ground_truth = validation_Y
    problem_prediction = elevation_prediction.unstack().any(axis=1)
    problem_ground_truth = elevation_ground_truth.unstack().any(axis=1)
    training_elevation_ground_truth = training_Y
    training_problem_ground_truth = training_elevation_ground_truth.unstack().any(axis=1)
    elevation_f1 = metrics.f1_score(elevation_ground_truth, elevation_prediction)
    problem_f1 = metrics.f1_score(problem_ground_truth, problem_prediction)
    f1_dict[problem] = {
        ("f1", "per_elevation"): elevation_f1,
        ("f1", "per_forecast"): problem_f1,
        ("training_n_true", "per_elevation"):
            f"{training_elevation_ground_truth.sum()}/{len(training_elevation_ground_truth)}",
        ("training_n_true", "per_forecast"):
            f"{training_problem_ground_truth.sum()}/{len(training_problem_ground_truth)}",
        ("validation_n_true", "per_elevation"):
            f"{elevation_ground_truth.sum()}/{len(elevation_ground_truth)}",
        ("validation_n_true", "per_forecast"):
            f"{problem_ground_truth.sum()}/{len(problem_ground_truth)}",
    }

with pd.option_context('display.max_rows', None,
                       'display.max_columns', None,
                       'display.expand_frame_repr', False):
    print(pd.DataFrame(f1_dict).T)

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.6

Feb 17, 2022

0.0.5

Feb 17, 2022

0.0.4

Feb 17, 2022

0.0.3

Feb 17, 2022

0.0.2

Feb 17, 2022

0.0.1

Feb 17, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

avaml-0.0.6.tar.gz (5.9 kB view details)

Uploaded Feb 17, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

avaml-0.0.6-py3-none-any.whl (6.0 kB view details)

Uploaded Feb 17, 2022 Python 3

File details

Details for the file avaml-0.0.6.tar.gz.

File metadata

Download URL: avaml-0.0.6.tar.gz
Upload date: Feb 17, 2022
Size: 5.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.6.4 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.10

File hashes

Hashes for avaml-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`75473fd4f1aad92c6aea9aed989f10f9be05e7271d481c200722435619fbe291`
MD5	`824ba4aa04af91b32652953d6b2a6251`
BLAKE2b-256	`2ae871532685210c00eb60e9f09b12c3324f3c4b1c77b6ae3ea01b9c377b6e2b`

See more details on using hashes here.

File details

Details for the file avaml-0.0.6-py3-none-any.whl.

File metadata

Download URL: avaml-0.0.6-py3-none-any.whl
Upload date: Feb 17, 2022
Size: 6.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.6.4 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.10

File hashes

Hashes for avaml-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`df2ae1d269147704c5208e5f74bbb75062939be9ede0116b40c4ce83f0c2ca94`
MD5	`23802d3e18920f18063fa916f2fa998a`
BLAKE2b-256	`a89908fbae45a97ff6ea71f8bccca0b3bea0a675d0070443d64c6acc2e93e3ea`

See more details on using hashes here.

avaml 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

avaml - helper functions for Avalanche machine learning

Installation

Example program

Searching for data

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes