Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison

This repository provides R-code accompanying the paper

Schulz, B. and Lerch, S. (2022). Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison. Monthly Weather Review, 150, 235-257, https://doi.org/10.1175/MWR-D-21-0150.1 (preprint version available at https://arxiv.org/abs/2106.09512).

In particular, code for the implementation and evaluation of the proposed postprocessing methods is available. In addition, two scripts exemplify the usage of the postprocessing functions.

Update (May 24): Description of uploaded data set

The data is now publicly available at

Schulz, B. and Lerch, S. (2024). Machine learning methods for postprocessing ensemble forecasts of wind gusts: Data. Karlsruhe Institute of Technology, https://doi.org/10.35097/afEBrMYqNrxxvrLX.

In the following, we provide information on the uploaded data set.

First, we note that we supply the score_data file, which includes the scores of all methods in the test period, and the loc_data file, which includes the name of the stations, their coordinates, their height and the height of the closest grid point. Note that the station data is required to generate the spatial predictors.

Ensemble data

The ensemble forecast data and observations are provided in the ens_data directory, where one file is provided for each lead time. Each file contains one data frame for training and testing. Further, one vector with the names of the additional meteorological variables is included.

Postprocessing data

The postprocessed forecasts for the test period 2016 are provided in the pp_forecasts directory, where one file is provided for each lead time. Each file contains one list with forecasts for each postprocessing method and the ensemble.

The data for the generation of the postprocessed forecasts are provided in the pp_data directory, where one file is provided for each lead time and method. Next to scores, sample sizes, run times, hyperparameters and predictors, we supply the following information:

Method	Information
`emos_sea`	Forecast distribution parameters, EMOS coefficients.
`mbm_sea`	Forecast ensembles, MBM coefficients.
`idr`	Forecast quantiles, fitted models.
`emos_bst`	Forecast distribution parameters, fitted crch-models (coefficients, residuals, coefficient path,...).
`qrf_loc`	Forecast quantiles, fitted RF models (feature importance, trees,...).
`bqn`	Forecast quantiles and Bernstein coefficients.
`drn`	Forecast distribution parameters.
`hen`	Forecasts for bin edges and bin probabilities.

Note that we reduced the size of the fitted EMOS-GB models, hence they cannot be used with the predict function. Still, one can extract the coefficients to generate predictions.

Further, we supply the epc_data file which includes the observational data base used the generate the EPC ensemble forecasts and the scores of the EPC forecasts on the test period (including the ensemble size of each EPC forecast).

Data

The data was supplied by the German weather service (Deutscher Wetterdienst, DWD) and was made publicly available by the authors in May 24.

Ensemble Forecasts

COSMO-DE-EPS

Variables: Wind gust and various additional variables (see Table 1).
Time period: 2010-12-08 - 2016-12-31
Forecast initialization time: 00 UTC
Ensemble members: 20
Forecast lead times: 00-21 h
Area: Germany
Resolution: 2.8 km horizontal resolution

More information on the COSMO model can be found here: http://www.cosmo-model.org/.

Observations

DWD SYNOP stations

Forecasts: Taken from closest grid points
Number of stations: 175
Attributes: longitude, latitude, altitude, altitude of closest COSMO grid point

Exemplary data set

Update May 24: Note that the full data set is now available.

We supply an additional training (df_train.RData) and test set (df_test.RData) that is derived from the data used in the paper together with the following comments:

The forecasts in the training set are from the period of 2010-2015, those in the test set from 2016. The initialization times are not supplied.
The forecasts have a common lead time, which is not supplied.
Ten different stations are used and labeled by 1,...,10. This does not conform with the actual station IDs, which are not supplied.
We are using only a small subset of the available predictor variables.
We added random noise to all numeric variables besides the transformed day of the year.

Post-processing

All models are estimated based on the period 2011-2015 and evaluated on 2016.

Basic methods

The basic models make only use of the ensemble forecasts of wind gusts.

Local Ensemble Model Output Statistics (EMOS) with CRPS estimation
Local Member-by-Member (MBM) with CRPS estimation
Local Isotonic Distributional Regression (IDR)

Incorporating additional informtion

The machine learning-based methodsmake incorporation of additional features feasible.

Local EMOS with gradient boosting (EMOS-GB) estimated via MLE
Local Quantile Regression Forests (QRF)

Locally adaptive networks

Based on neural networks and station embedding we built a locally adaptive joint model for all stations. The three variants are based on the same architecture and differ only in estimation and forecast type. All network models are estimated using stochastic gradient descent.

Distributional Regression Network (DRN) estimated via CRPS and aggregated via parameter averaging
Bernstein Quantile Network (BQN) estimated via Quantile Loss and aggregated via quantile averaging (Vincentization)
Histogram Estimation Network (HEN) estimated via MLE and aggregated via quantile averaging (Vincentization)

Code

Here, we supply the code used for data preprocessing, postprocessing and evaluation. Two exemplary scripts for the usage of the postprocessing functions are given. Each of the R-files includes functions that can be applied to data of the desired format.

File	Description
`example_drn`	Example for the use of DRN.
`example_emos`	Example for the use of EMOS.
`fn_data`	Data processing.
`fn_eval`	Evaluation.
`pp_bqn`	Postprocessing via BQN.
`pp_drn`	Postprocessing via DRN.
`pp_emos`	Postprocessing via EMOS.
`pp_emos_bst`	Postprocessing via EMOS-GB.
`pp_hen`	Postprocessing via HEN.
`pp_idr`	Postprocessing via IDR.
`pp_mbm`	Postprocessing via MBM.
`pp_qrf`	Postprocessing via QRF.

Data structure used

The functions are based on the structure of the COSMO data, which is given by R-dataframes (generated from netCDF-files in a preprocessing step). We hope that the following comments will help the reader in understanding the code.

The following table describes variable names that are referred to in the code (and the example data):

Variable	Description
`obs`	Observed speed of a wind gust in m/s.
`location`	Station ID.
`ens_mean`	Mean of the wind gust ensemble in m/s.
`ens_sd`	Standard deviation of the wind gust ensemble in m/s.
`ens_i`	i-th Member of the wind gust ensemble in m/s, i = 1, ..., 20.
`sens_mean_i`	Mean of the i-th sub-ensemble of the wind gust ensemble in m/s, i = 1, ..., 4. The four sub-ensembles are given by the members 1-5 (i = 1), 6-10 (i = 2), 11-15 (i = 3) and 16-20 (i = 4).
`sens_spread_i`	Spread of the i-th sub-ensemble of the wind gust ensemble in m/s, i = 1, ..., 4. The four sub-ensembles are given by the members 1-5 (i = 1), 6-10 (i = 2), 11-15 (i = 3) and 16-20 (i = 4).

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
LICENSE		LICENSE
README.md		README.md
df_test.RData		df_test.RData
df_train.RData		df_train.RData
example_drn.R		example_drn.R
example_emos.R		example_emos.R
fn_data.R		fn_data.R
fn_eval.R		fn_eval.R
pp_bqn.R		pp_bqn.R
pp_drn.R		pp_drn.R
pp_emos.R		pp_emos.R
pp_emos_bst.R		pp_emos_bst.R
pp_hen.R		pp_hen.R
pp_idr.R		pp_idr.R
pp_mbm.R		pp_mbm.R
pp_qrf.R		pp_qrf.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison

Update (May 24): Description of uploaded data set

Data

Ensemble Forecasts

Observations

Exemplary data set

Post-processing

Basic methods

Incorporating additional informtion

Locally adaptive networks

Code

Data structure used

About

Releases

Packages

Languages

License

benediktschulz/paper_pp_wind_gusts

Folders and files

Latest commit

History

Repository files navigation

Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison

Update (May 24): Description of uploaded data set

Data

Ensemble Forecasts

Observations

Exemplary data set

Post-processing

Basic methods

Incorporating additional informtion

Locally adaptive networks

Code

Data structure used

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages