Facilitating machine learning in the Norwegian Avalanche Forecasting Service
Project description
avaml - helper functions for Avalanche machine learning
This package contains functions used to prepare data from the Norwegian Avalanche Forecasting Service to facilitate machine learning.
Installation
To install using pip
:
pip install avaml
Example program
Searching for data
Here is a short example program using the package to prepare data for training a RandomForestClassifier:
import sys
from regobslib import SnowRegion
import avaml
import datetime as dt
import pandas as pd
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier
DAYS_IN_TIMELINE = 4
START_DATE = dt.date(2017, 11, 1)
STOP_DATE = dt.date(2021, 7, 1)
TRAINING_REGIONS = [
SnowRegion.VEST_FINNMARK,
SnowRegion.NORD_TROMS,
SnowRegion.LYNGEN,
SnowRegion.SOR_TROMS,
SnowRegion.INDRE_TROMS,
SnowRegion.LOFOTEN_OG_VESTERALEN,
SnowRegion.SVARTISEN,
SnowRegion.HELGELAND,
SnowRegion.TROLLHEIMEN,
SnowRegion.ROMSDAL,
SnowRegion.SUNNMORE,
SnowRegion.INDRE_FJORDANE,
SnowRegion.JOTUNHEIMEN,
SnowRegion.HALLINGDAL,
SnowRegion.INDRE_SOGN,
SnowRegion.VOSS,
SnowRegion.HEIANE,
]
VALIDATION_REGIONS = [
SnowRegion.TROMSO,
SnowRegion.SALTEN,
SnowRegion.VEST_TELEMARK,
]
TEST_REGIONS = [
SnowRegion.FINNMARKSKYSTEN,
SnowRegion.OFOTEN,
SnowRegion.HARDANGER,
]
REGIONS = sorted(TRAINING_REGIONS + VALIDATION_REGIONS + TEST_REGIONS)
# fetch_and_prepare_aps_varsom() does a number of things:
# * See if a call with the same parameters have been called earlier.
# If so, load prepared csv's from disk.
# * If there is no prepared data on disk, see if there are raw
# data instead.
# * If no previously downloaded data is found, download data from
# APS and Varsom.
# * Save the data as csv's to the 'cache_dir'
# * Prepare the data:
# * It transforms the Varsom DataFrame to contain the same index
# as the APS DataFrame, replicating the forecast for every
# elevation level. It then removes avalanche problems that
# does not exist at the row's elevation level and remove the
# elevation data from the avalanche problems. All rows that
# does not contain complete APS data is removed.
# * Creates a timeline out of the APS DataFrame. This means that
# the APS data is concatenated onto a shifted version of itself,
# making each APS row contain several days worth of data.
# This will make some rows incomplete. They will be removed.
aps, varsom = avaml.prepare.fetch_and_prepare_aps_varsom(START_DATE,
STOP_DATE,
DAYS_IN_TIMELINE,
REGIONS,
print_to=sys.stdout, # Print progression to terminal
cache_dir=".", # Use current dir for csv-files
read_cache=True,
write_cache=True)
print("Training and predicting problems")
# split_avalanche_problems() takes the Varsom Dataframe and splits it into
# several DataFrames, containing information about one avalanche problem each.
# If a problem row contains some values, but also some NaNs, it is invalid and
# is removed from both Varsom and APS.
#
# A 3-tuple is returned, (problems_X: Dict, problems_Y: Dict, problems: List).
# * problems_X is a dict with the problem as key and a DataFrame with indata as value
# * problems_Y is a dict with the problem as key and a DataFrame with labels as value
# * problems is a list of avalanche problem names
problems_X, problems_Y, problems = avaml.prepare.split_avalanche_problems(aps, varsom)
f1_dict = {}
for problem in problems:
X = problems_X[problem]
Y = problems_Y[problem]
# Splitting data into TRAINING and VALIDATION sets
training_index = Y.index.isin(TRAINING_REGIONS, level="region")
validation_index = Y.index.isin(VALIDATION_REGIONS, level="region")
training_X = X.loc[training_index]
training_Y = Y.loc[training_index].any(axis=1)
validation_X = X.loc[validation_index]
validation_Y = Y.loc[validation_index].any(axis=1)
# Training and validating
classifier = RandomForestClassifier(n_estimators=10)
classifier.fit(training_X.values, training_Y)
prediction = pd.Series(classifier.predict(validation_X.values), index=validation_X.index)
# Calculating and storing scores to dict
elevation_prediction = prediction
elevation_ground_truth = validation_Y
problem_prediction = elevation_prediction.unstack().any(axis=1)
problem_ground_truth = elevation_ground_truth.unstack().any(axis=1)
training_elevation_ground_truth = training_Y
training_problem_ground_truth = training_elevation_ground_truth.unstack().any(axis=1)
elevation_f1 = metrics.f1_score(elevation_ground_truth, elevation_prediction)
problem_f1 = metrics.f1_score(problem_ground_truth, problem_prediction)
f1_dict[problem] = {
("f1", "per_elevation"): elevation_f1,
("f1", "per_forecast"): problem_f1,
("training_n_true", "per_elevation"):
f"{training_elevation_ground_truth.sum()}/{len(training_elevation_ground_truth)}",
("training_n_true", "per_forecast"):
f"{training_problem_ground_truth.sum()}/{len(training_problem_ground_truth)}",
("validation_n_true", "per_elevation"):
f"{elevation_ground_truth.sum()}/{len(elevation_ground_truth)}",
("validation_n_true", "per_forecast"):
f"{problem_ground_truth.sum()}/{len(problem_ground_truth)}",
}
with pd.option_context('display.max_rows', None,
'display.max_columns', None,
'display.expand_frame_repr', False):
print(pd.DataFrame(f1_dict).T)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
avaml-0.0.6.tar.gz
(5.9 kB
view details)
Built Distribution
avaml-0.0.6-py3-none-any.whl
(6.0 kB
view details)
File details
Details for the file avaml-0.0.6.tar.gz
.
File metadata
- Download URL: avaml-0.0.6.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.6.4 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75473fd4f1aad92c6aea9aed989f10f9be05e7271d481c200722435619fbe291 |
|
MD5 | 824ba4aa04af91b32652953d6b2a6251 |
|
BLAKE2b-256 | 2ae871532685210c00eb60e9f09b12c3324f3c4b1c77b6ae3ea01b9c377b6e2b |
File details
Details for the file avaml-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: avaml-0.0.6-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.6.4 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | df2ae1d269147704c5208e5f74bbb75062939be9ede0116b40c4ce83f0c2ca94 |
|
MD5 | 23802d3e18920f18063fa916f2fa998a |
|
BLAKE2b-256 | a89908fbae45a97ff6ea71f8bccca0b3bea0a675d0070443d64c6acc2e93e3ea |