Facilitating machine learning in the Norwegian Avalanche Forecasting Service
Project description
avaml - helper functions for Avalanche machine learning
This package contains functions used to prepare data from the Norwegian Avalanche Forecasting Service to facilitate machine learning.
Installation
To install using pip:
pip install avaml
Example program
Searching for data
Here is a short example program using the package to prepare data for training a RandomForestClassifier:
import sys
from regobslib import SnowRegion
import avaml
import datetime as dt
import pandas as pd
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier
DAYS_IN_TIMELINE = 4
START_DATE = dt.date(2017, 11, 1)
STOP_DATE = dt.date(2021, 7, 1)
TRAINING_REGIONS = [
SnowRegion.VEST_FINNMARK,
SnowRegion.NORD_TROMS,
SnowRegion.LYNGEN,
SnowRegion.SOR_TROMS,
SnowRegion.INDRE_TROMS,
SnowRegion.LOFOTEN_OG_VESTERALEN,
SnowRegion.SVARTISEN,
SnowRegion.HELGELAND,
SnowRegion.TROLLHEIMEN,
SnowRegion.ROMSDAL,
SnowRegion.SUNNMORE,
SnowRegion.INDRE_FJORDANE,
SnowRegion.JOTUNHEIMEN,
SnowRegion.HALLINGDAL,
SnowRegion.INDRE_SOGN,
SnowRegion.VOSS,
SnowRegion.HEIANE,
]
VALIDATION_REGIONS = [
SnowRegion.TROMSO,
SnowRegion.SALTEN,
SnowRegion.VEST_TELEMARK,
]
TEST_REGIONS = [
SnowRegion.FINNMARKSKYSTEN,
SnowRegion.OFOTEN,
SnowRegion.HARDANGER,
]
REGIONS = sorted(TRAINING_REGIONS + VALIDATION_REGIONS + TEST_REGIONS)
# fetch_and_prepare_aps_varsom() does a number of things:
# * See if a call with the same parameters have been called earlier.
# If so, load prepared csv's from disk.
# * If there is no prepared data on disk, see if there are raw
# data instead.
# * If no previously downloaded data is found, download data from
# APS and Varsom.
# * Save the data as csv's to the 'cache_dir'
# * Prepare the data:
# * It transforms the Varsom DataFrame to contain the same index
# as the APS DataFrame, replicating the forecast for every
# elevation level. It then removes avalanche problems that
# does not exist at the row's elevation level and remove the
# elevation data from the avalanche problems. All rows that
# does not contain complete APS data is removed.
# * Creates a timeline out of the APS DataFrame. This means that
# the APS data is concatenated onto a shifted version of itself,
# making each APS row contain several days worth of data.
# This will make some rows incomplete. They will be removed.
aps, varsom = avaml.prepare.fetch_and_prepare_aps_varsom(START_DATE,
STOP_DATE,
DAYS_IN_TIMELINE,
REGIONS,
print_to=sys.stdout, # Print progression to terminal
cache_dir=".", # Use current dir for csv-files
read_cache=True,
write_cache=True)
print("Training and predicting problems")
# split_avalanche_problems() takes the Varsom Dataframe and splits it into
# several DataFrames, containing information about one avalanche problem each.
# If a problem row contains some values, but also some NaNs, it is invalid and
# is removed from both Varsom and APS.
#
# A 3-tuple is returned, (problems_X: Dict, problems_Y: Dict, problems: List).
# * problems_X is a dict with the problem as key and a DataFrame with indata as value
# * problems_Y is a dict with the problem as key and a DataFrame with labels as value
# * problems is a list of avalanche problem names
problems_X, problems_Y, problems = avaml.prepare.split_avalanche_problems(aps, varsom)
f1_dict = {}
for problem in problems:
X = problems_X[problem]
Y = problems_Y[problem]
# Splitting data into TRAINING and VALIDATION sets
training_index = Y.index.isin(TRAINING_REGIONS, level="region")
validation_index = Y.index.isin(VALIDATION_REGIONS, level="region")
training_X = X.loc[training_index]
training_Y = Y.loc[training_index].any(axis=1)
validation_X = X.loc[validation_index]
validation_Y = Y.loc[validation_index].any(axis=1)
# Training and validating
classifier = RandomForestClassifier(n_estimators=10)
classifier.fit(training_X.values, training_Y)
prediction = pd.Series(classifier.predict(validation_X.values), index=validation_X.index)
# Calculating and storing scores to dict
elevation_prediction = prediction
elevation_ground_truth = validation_Y
problem_prediction = elevation_prediction.unstack().any(axis=1)
problem_ground_truth = elevation_ground_truth.unstack().any(axis=1)
training_elevation_ground_truth = training_Y
training_problem_ground_truth = training_elevation_ground_truth.unstack().any(axis=1)
elevation_f1 = metrics.f1_score(elevation_ground_truth, elevation_prediction)
problem_f1 = metrics.f1_score(problem_ground_truth, problem_prediction)
f1_dict[problem] = {
("f1", "per_elevation"): elevation_f1,
("f1", "per_forecast"): problem_f1,
("training_n_true", "per_elevation"):
f"{training_elevation_ground_truth.sum()}/{len(training_elevation_ground_truth)}",
("training_n_true", "per_forecast"):
f"{training_problem_ground_truth.sum()}/{len(training_problem_ground_truth)}",
("validation_n_true", "per_elevation"):
f"{elevation_ground_truth.sum()}/{len(elevation_ground_truth)}",
("validation_n_true", "per_forecast"):
f"{problem_ground_truth.sum()}/{len(problem_ground_truth)}",
}
with pd.option_context('display.max_rows', None,
'display.max_columns', None,
'display.expand_frame_repr', False):
print(pd.DataFrame(f1_dict).T)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file avaml-0.0.6.tar.gz.
File metadata
- Download URL: avaml-0.0.6.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.6.4 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75473fd4f1aad92c6aea9aed989f10f9be05e7271d481c200722435619fbe291
|
|
| MD5 |
824ba4aa04af91b32652953d6b2a6251
|
|
| BLAKE2b-256 |
2ae871532685210c00eb60e9f09b12c3324f3c4b1c77b6ae3ea01b9c377b6e2b
|
File details
Details for the file avaml-0.0.6-py3-none-any.whl.
File metadata
- Download URL: avaml-0.0.6-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.6.4 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df2ae1d269147704c5208e5f74bbb75062939be9ede0116b40c4ce83f0c2ca94
|
|
| MD5 |
23802d3e18920f18063fa916f2fa998a
|
|
| BLAKE2b-256 |
a89908fbae45a97ff6ea71f8bccca0b3bea0a675d0070443d64c6acc2e93e3ea
|