Simple package to compare and analyse 2 time series. We use the following conventionn in this repo:
sim
: modelled surge time seriesmod
: observed surge time series
These are following metrics available in this repository:
We need metrics to assess the quality of the model. We define the most important ones, as stated on this Stats wiki:
Performance Scores (PS) or Nash-Sutcliffe Eff (NSE): $$1 - \frac{\langle (x_c - x_m)^2 \rangle}{\langle (x_m - x_R)^2 \rangle}$$
-
r
the correlation -
b
the modified bias term (see ref)$$\frac{\langle x_c \rangle - \langle x_m \rangle}{\sigma_m}$$ -
g
the std dev term$$\frac{\sigma_c}{\sigma_m}$$
- with
kappa
$$2 \cdot \left| \sum{((x_m - \overline{x}_m) \cdot (x_c - \overline{x}_c))} \right|$$
The metrics are returned by the function
storm_metrics(sim: pd.Series, obs: pd.Series, quantile: float, cluster_duration:int = 72)
which returns this dictionary:
"db_match" : val,
"R1_norm": val,
"R1": val,
"R3_norm": val,
"R3": val,
"error": val,
"error_metric": val
we defined the following metrics for the storms events:
-
R1
/R3
/error_metric
: we select the biggest observated storms, and then calculate error (so the absolute value of differenc between the model and the observed peaks)R1
is the error for the biggest stormR3
is the mean error for the 3 biggest stormserror_metric
is the mean error for all the storms above the threshold.
-
R1_norm
/R3_norm
/error
: Same methodology, but values are in normalised (in %) by the observed peaks.
The storm_metrics()
might return:
"db_match" : np.nan,
"R1_norm": np.nan,
"R1": np.nan,
"R3_norm": np.nan,
"R3": np.nan,
"error": np.nan,
"error_metric": np.nan
this happens when the function storms/match_extremes.py
couldn't finc concomitent storms for the observed and modeled time series.
see notebook fior details
stats = get_stats(sim, obs)
metric99 = storm_metrics(sim, obs, quantile=0.99, cluster_duration=72)
metric95 = storm_metrics(sim, obs, quantile=0.95, cluster_duration=72)