---
title: "Minimal example: Wien, ET-diagnostics scenario grid"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Minimal example: Wien, ET-diagnostics scenario grid}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE, eval = TRUE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

is_ghactions <- tolower(Sys.getenv("GITHUB_ACTIONS")) == "true" ||
  tolower(Sys.getenv("CI")) %in% c("true", "1", "yes")

path_base <- system.file("extdata/models/wien/base.h5", package = "kwb.raindrop")
data_available <- nzchar(path_base) && file.exists(path_base)
is_windows <- Sys.info()[["sysname"]] == "Windows"
```

This vignette is a small, self-contained reduction of the full Wien
workflow used for end-to-end checks (CI and review) and — since
2026-05-07 — for diagnosing the very high modelled ET share
(~47-58 %) versus the SWIMM Urban Eva reference of ~7 % for a
comparable parametrisation. Geometry is fixed at Daniel's reference
design (`mulde_area = 75`, `mulde_height = 200`, `filter_height = 300`,
`storage_height = 500`, `filter_kf = 36`, `bottom_kf = 12`).

**`base.h5` defaults are used as-is on every row** (`Dach/Evapotranspiration_aktiv = 1`,
`EvapPond = 1`, `LAI = 8.5`). Daniel's three review corrections
(`Dach/Evapotranspiration_aktiv = 0`, `EvapPond = 0`, `LAI = 3.9`)
were applied between PR #11 and the current branch but made the
Tandler engine return Status 1 for **every** scenario, leaving the
rendered datatable with no usable result columns. They are reverted
on this branch so the engine produces output again. The corrections
will be re-introduced one-by-one as sweep dimensions in a follow-up
diagnostic vignette so the failing combination can be isolated.

The 12-scenario grid varies three engine-side switches that are
candidate ET-discrepancy drivers:

* `//Berechnungsparameter/keineVerdunstungBeiRegen` — `0` (default,
  ET active during rain — physically suspect) vs `1` (ET suppressed
  while raining).
* `//Berechnungsparameter/Hoernschemeyer_aktiv` — `0` (default,
  generic ET model) vs `1` (Hoernschemeyer et al. 2023 scheme with
  seasonal Growth/Shading modulation; with the current flat
  `Growth_1 = Shading_1 = 1` curves the legacy path effectively
  unmodulates ET).
* `//Massnahmenelemente/Mulde_Rigole/Parameter_Evapotranspiration/ET0ref_GrasReferenzverdunstung`
  — default `0.01` looks like a scaling factor, not a reference
  ET₀ value. Sweep `0`, `1`, `100` to check whether the parameter
  acts as a multiplier on the ET₀ curve (in which case `1` is "normal"
  behaviour and the `0.01` default is what is currently throttling
  / unbalancing ET).

After the run, the per-scenario `*.h5` inputs are dumped to a single
XLSX file (`raindrop_wien_minimal_params.xlsx`) with one sheet per
scenario and a `base` sheet for the un-modified template, so the
exact engine input for any row can be diffed against the others.

### Input data

Precipitation (10-minute resolution) and evapotranspiration (daily
ET0) for 2011-2025 are provided by
[GeoSphere Austria](https://www.geosphere.at/) (Österreichischer
Wetterdienst, formerly ZAMG) and ship with the package under
`inst/extdata/models/wien/`. The HDF5 model template (`base.h5`)
is produced with the Tandler "Regenwasserbewirtschaftung" calculation
engine; the engine itself is downloaded from the
`KWB-R/kwb.raindrop.binaries` GitHub Release on demand via
`kwb.raindrop::download_engine()`.

### Static `base.h5` parameter overview

All values present in the shipped `base.h5` template. The scenarios
below override geometry, three ET-related engine switches, plus the
three corrections listed above (roof ET off, pond ET off, LAI = 3.9).
Everything else listed here is the static default that drives the
simulation.

```{r show_static_params, eval = data_available}
library(kwb.raindrop)

h5 <- hdf5r::H5File$new(path_base, mode = "r")
all_vals <- kwb.raindrop::h5_read_values(h5)
h5$close_all()

format_value <- function(v) {
  if (is.data.frame(v)) {
    sprintf("[time series: %d rows, columns: %s]",
            nrow(v), paste(names(v), collapse = ", "))
  } else if (length(v) > 1L) {
    paste(format(v, trim = TRUE), collapse = ", ")
  } else {
    format(v, trim = TRUE)
  }
}

base_params <- tibble::tibble(
  parameter = names(all_vals),
  value = vapply(all_vals, format_value, character(1L))
) |>
  dplyr::arrange(parameter)

DT::datatable(
  base_params,
  filter = "top",
  options = list(pageLength = 25, autoWidth = TRUE),
  caption = sprintf("All %d parameters in base.h5 (Wien)", nrow(base_params))
)
```

### Twelve-scenario parameter grid

Daniel's reference geometry at every row; the cross product of
`keineVerdunstungBeiRegen × Hoernschemeyer_aktiv × ET0ref` gives
2 × 2 × 3 = 12 scenarios.

```{r preparation, eval = data_available}
path_list <- list(
  modelname = "Wien",
  root_path = file.path(tempdir(), "raindrop_wien_minimal"),
  dir_input  = "<root_path>/models/<modelname>/input",
  dir_output = "<root_path>/models/<modelname>/output",
  dir_target_output = "<dir_output>/<dir_target>",
  file_errors_hdf5 = "Fehlerprotokoll.h5",
  file_results_hdf5_element = "Mulde_Rigole.h5",
  file_results_hdf5_flaeche = "Dach.h5",
  file_results_hdf5_verschaltungen = "<dir_target>_Verschaltungen.h5",
  file_results_txt = "Mulde_Rigole_RAINDROP.txt",
  file_results_txt_multilayer = "Mulde_Rigole_RAINDROP_multi_layer.txt",
  file_target = "<dir_target>.h5",
  path_base = system.file("extdata/models/wien/base.h5",  package = "kwb.raindrop"),
  path_exe  = if (is_windows) kwb.raindrop::download_engine() else NA_character_,
  path_et   = system.file("extdata/models/wien/et.csv",   package = "kwb.raindrop"),
  path_rain = system.file("extdata/models/wien/rain.csv.gz", package = "kwb.raindrop"),
  path_errors_hdf5 = "<dir_target_output>/<file_errors_hdf5>",
  path_results_hdf5_element = "<dir_target_output>/<file_results_hdf5_element>",
  path_results_hdf5_flaeche = "<dir_target_output>/<file_results_hdf5_flaeche>",
  path_results_hdf5_verschaltungen = "<dir_target_output>/<file_results_hdf5_verschaltungen>",
  path_results_txt = "<dir_target_output>/<file_results_txt>",
  path_results_txt_multilayer = "<dir_target_output>/<file_results_txt_multilayer>",
  path_target_input = "<dir_input>/<file_target>"
)

param_grid <- expand.grid(
  keineVerdunstungBeiRegen = c(0L, 1L),
  Hoernschemeyer_aktiv     = c(0L, 1L),
  ET0ref_factor            = c(0, 1, 100),
  KEEP.OUT.ATTRS = FALSE,
  stringsAsFactors = FALSE
)
param_grid <- tibble::as_tibble(param_grid) |>
  dplyr::mutate(
    scenario_name                = sprintf("s%05d", seq_len(dplyr::n())),
    connected_area               = 1000,
    mulde_area                   = 75,
    mulde_height                 = 200,
    filter_hydraulicconductivity = 36,
    filter_height                = 300,
    storage_height               = 500,
    bottom_hydraulicconductivity = 12,
    rain_factor                  = 1
  ) |>
  dplyr::relocate(scenario_name, .before = dplyr::everything())

DT::datatable(
  param_grid,
  filter = "top",
  options = list(pageLength = 12, autoWidth = TRUE),
  caption = "Twelve scenarios — fixed Daniel-reference geometry, sweep of three ET-related engine switches."
)

psi_s_mm <- function(kf_mmh) (3.237 * (kf_mmh / 25.4)^(-0.328)) * 25.4

paths <- kwb.utils::resolve(path_list)

timeseries_et <- readr::read_delim(paths$path_et, delim = ";", col_types = "cd") |>
  dplyr::mutate(
    date = lubridate::dmy(date),
    time = as.integer(difftime(date, min(date), units = "hours"))
  ) |>
  dplyr::select(-date) |>
  dplyr::filter(!is.na(value)) |>
  dplyr::relocate(time, .before = value)
timeseries_et$time[nrow(timeseries_et)] <- ceiling(timeseries_et$time[nrow(timeseries_et)])

timeseries_rain <- readr::read_csv(paths$path_rain, show_col_types = FALSE) |>
  dplyr::rename(datetime = time, value = rr) |>
  dplyr::mutate(time = as.double(difftime(datetime, min(datetime), units = "secs")) / 3600) |>
  dplyr::filter(!is.na(value)) |>
  dplyr::select(-datetime, -station) |>
  dplyr::relocate(time, .before = value)
timeseries_rain$time[nrow(timeseries_rain)] <- ceiling(timeseries_rain$time[nrow(timeseries_rain)])

# Align ET / rain end-times to the longer of the two
if (max(timeseries_rain$time) > max(timeseries_et$time)) {
  timeseries_et <- dplyr::bind_rows(
    timeseries_et,
    tibble::tibble(time = max(timeseries_rain$time),
                   value = timeseries_et$value[nrow(timeseries_et)])
  )
}
if (max(timeseries_et$time) > max(timeseries_rain$time)) {
  timeseries_rain <- dplyr::bind_rows(
    timeseries_rain,
    tibble::tibble(time = max(timeseries_et$time),
                   value = timeseries_rain$value[nrow(timeseries_rain)])
  )
}

# Convert rain from mm to mm/h (per 10-min interval)
period <- c(diff(timeseries_rain$time), mean(diff(timeseries_rain$time)))
timeseries_rain$value <- timeseries_rain$value / period

# Convert ET0 from mm/day to mm/h. The engine reads //Kurven/ET0 as a rate in
# mm/h (same convention as //Kurven/Regen on the hour-based time axis), so the
# daily ET0 values must be divided by their interval in hours (= 24 for daily
# data) -- exactly the symmetric step rain already gets. Without it the kernel
# integrates ET0 24x too high, which is the cause of the implausibly large
# modelled ET share.
period_et <- c(diff(timeseries_et$time), mean(diff(timeseries_et$time)))
timeseries_et$value <- timeseries_et$value / period_et
```

### Run the twelve scenarios

```{r run_model, eval = data_available && is_windows}
run_one <- function(i, timestep_hours, debug = FALSE, ...) {
  param_grid_tmp <- param_grid[i, ]
  paths <- kwb.utils::resolve(path_list, dir_target = param_grid_tmp$scenario_name)

  fs::dir_create(paths$dir_input, recurse = TRUE)
  fs::dir_create(paths$dir_output, recurse = TRUE)
  fs::dir_create(paths$dir_target_output, recurse = TRUE)

  fs::file_copy(path = paths$path_base,
                new_path = paths$path_target_input,
                overwrite = TRUE)

  h5 <- hdf5r::H5File$new(paths$path_target_input, mode = "a")
  new_path <- stringr::str_c(normalizePath(fs::path_abs(paths$dir_target_output)), "\\")
  vals <- kwb.raindrop::h5_read_values(h5)

  vals$`//Berechnungsparameter/Ergebnispfad` <- new_path
  vals$`//Berechnungsparameter/Zeitschritt_Infiltration` <- timestep_hours
  vals$`//Berechnungsparameter/Zeitschritt_ET` <- timestep_hours
  vals$`//Berechnungsparameter/Zeitschritt_Verschaltungen` <- timestep_hours
  vals$`//Berechnungsparameter/R-Plots` <- 0
  vals$`//Berechnungsparameter/Ausgabemodus` <- "Optimierung"
  vals$`//Berechnungsparameter/Evapotranspiration_aktiv` <- 1

  # ET-diagnostics sweep dimensions
  vals$`//Berechnungsparameter/keineVerdunstungBeiRegen` <- param_grid_tmp$keineVerdunstungBeiRegen
  vals$`//Berechnungsparameter/Hoernschemeyer_aktiv` <- param_grid_tmp$Hoernschemeyer_aktiv
  vals$`//Massnahmenelemente/Mulde_Rigole/Parameter_Evapotranspiration/ET0ref_GrasReferenzverdunstung` <-
    param_grid_tmp$ET0ref_factor

  # NOTE: Daniel's three review corrections (Dach/ET = 0, EvapPond = 0,
  # LAI = 3.9) were applied here from PR #11 onward but made the Tandler
  # engine return Status 1 for every scenario, which left every result
  # column NA in the rendered datatable. Reverted on this branch so the
  # engine produces output; corrections will be re-introduced one-by-one
  # as sweep dimensions in a follow-up so the failing combination can be
  # isolated.
  vals$`//Massnahmenelemente/Dach/Berechnungsparameter/Evapotranspiration_aktiv` <- 1
  vals$`//Massnahmenelemente/Dach/Allgemein/Flaeche` <- param_grid_tmp$connected_area

  vals$`//Massnahmenelemente/Mulde_Rigole/Berechnungsparameter/Evapotranspiration_aktiv` <- 1
  vals$`//Massnahmenelemente/Mulde_Rigole/Allgemein/Regen-Skalierungsfaktor` <- param_grid_tmp$rain_factor
  vals$`//Massnahmenelemente/Mulde_Rigole/Allgemein/Flaeche` <- param_grid_tmp$mulde_area
  vals$`//Massnahmenelemente/Mulde_Rigole/Eigenschaften_Oberflaeche/Ueberlaufhoehe` <- param_grid_tmp$mulde_height
  vals$`//Massnahmenelemente/Mulde_Rigole/Bodenschichtung/Startwerte_theta_ActualSoilMoisture` <- c(0.3, 0)
  vals$`//Massnahmenelemente/Mulde_Rigole/Bodenschichtung/Schichtdicken` <- c(
    param_grid_tmp$filter_height, param_grid_tmp$storage_height
  )
  vals$`//Massnahmenelemente/Mulde_Rigole/Allgemein/Endversickerungsrate` <-
    param_grid_tmp$bottom_hydraulicconductivity

  vals$`//Bodenarten/Bodenfilter/Ks_HydraulicConductivity` <- param_grid_tmp$filter_hydraulicconductivity
  vals$`//Bodenarten/Bodenfilter/Psi_Saugspannung_CapillarySuction` <-
    psi_s_mm(param_grid_tmp$filter_hydraulicconductivity)


  vals$`//Kurven/ET0` <- timeseries_et
  vals$`//Kurven/Regen` <- timeseries_rain
  vals$`//Kurven/Growth_1`$time[2] <- max(timeseries_rain$time)
  vals$`//Kurven/Shading_1`$time[2] <- max(timeseries_rain$time)

  # h5_write_values(strict = FALSE) downgrades errors (e.g. a scalar dataset
  # given a length-2 override) to warnings so the scenario still produces an
  # input H5 the engine can attempt to run. Combined with the per-scenario
  # tryCatch below, this means one broken scenario no longer aborts
  # run_scenarios for everyone else. The diagnostic goes through message()
  # (not cat()) so it survives the future.apply worker boundary should this
  # vignette ever switch to parallel = TRUE -- the call below passes
  # parallel = FALSE today so the choice is forward-looking.
  tryCatch({
    kwb.raindrop::h5_write_values(h5, vals, resize = TRUE, strict = FALSE,
                                 scalar_strategy = "first", verbose = FALSE)
    if (h5$is_valid) h5$close_all()
    kwb.raindrop::run_model(path_exe = paths$path_exe,
                            path_input = paths$path_target_input,
                            debug = debug)
  }, error = function(e) {
    if (h5$is_valid) try(h5$close_all(), silent = TRUE)
    message(sprintf("[%s] run failed: %s",
                    param_grid_tmp$scenario_name, conditionMessage(e)))
  })
  invisible(NULL)
}

kwb.raindrop::run_scenarios(
  indices = seq_len(nrow(param_grid)),
  run_one_scenario = run_one,
  timestep_hours = 0.1,
  debug = FALSE,
  parallel = FALSE,
  show_progress = FALSE
)
```

### Persist per-scenario engine inputs to XLSX

The XLSX dump bundles, in one file:

* `base` — full flattened list of `base.h5` values (un-modified template).
* `timeseries_info` — period and water-balance totals for the rain and
  ET0 series fed to every scenario (these are identical across runs).
* `applied_settings` — long-format diff of every key the package writes
  on top of `base.h5`, per scenario (`scenario × parameter × base_value
  × scenario_value`). Use this to see at a glance exactly what RAINDROP
  is being told to change.
* `s00001`, `s00002`, ..., `s00012` — full per-scenario H5 dump (the
  same view as `base`, but for each scenario's modified input file).

```{r xlsx_dump, eval = data_available && is_windows}
format_h5_value <- function(v) {
  if (is.null(v)) {
    "<NULL>"
  } else if (is.data.frame(v)) {
    sprintf("[time series: %d rows; cols: %s]",
            nrow(v), paste(names(v), collapse = ", "))
  } else if (length(v) > 1L) {
    paste(format(v, trim = TRUE), collapse = ", ")
  } else {
    format(v, trim = TRUE)
  }
}

vals_to_tibble <- function(vals) {
  tibble::tibble(
    parameter = names(vals),
    value     = vapply(vals, format_h5_value, character(1L))
  ) |>
    dplyr::arrange(parameter)
}

vals_equal <- function(a, b) {
  if (is.null(a) || is.null(b)) return(identical(a, b))
  isTRUE(all.equal(a, b))
}

# --- Base sheet -----------------------------------------------------------
h5_base <- hdf5r::H5File$new(path_base, mode = "r")
base_vals <- kwb.raindrop::h5_read_values(h5_base)
h5_base$close_all()

sheets <- list(base = vals_to_tibble(base_vals))

# --- Timeseries info (identical across all scenarios) ---------------------
# `timeseries_rain$value` and `timeseries_et$value` are both mm/h after the
# per-interval scaling earlier; multiply by the original interval (in h) to
# recover total mm.
rain_period_h <- c(diff(timeseries_rain$time),
                   mean(diff(timeseries_rain$time)))
rain_total_mm <- sum(timeseries_rain$value * rain_period_h)
et0_period_h  <- c(diff(timeseries_et$time),
                   mean(diff(timeseries_et$time)))
et0_total_mm  <- sum(timeseries_et$value * et0_period_h)
sim_period_h  <- max(timeseries_rain$time) - min(timeseries_rain$time)
sim_period_yr <- sim_period_h / (24 * 365.25)

sheets$timeseries_info <- tibble::tibble(
  series        = c("Rain (//Kurven/Regen)", "ET0 (//Kurven/ET0)"),
  n_rows        = c(nrow(timeseries_rain), nrow(timeseries_et)),
  unit          = rep("mm/h (engine input; convert via *period_h to mm)", 2),
  period_hours  = round(c(sim_period_h, max(timeseries_et$time) -
                                          min(timeseries_et$time)), 1),
  period_years  = round(rep(sim_period_yr, 2), 2),
  total_mm      = round(c(rain_total_mm, et0_total_mm), 1),
  mean_per_year = round(c(rain_total_mm, et0_total_mm) / sim_period_yr, 1),
  min_value     = c(min(timeseries_rain$value), min(timeseries_et$value)),
  max_value     = c(max(timeseries_rain$value), max(timeseries_et$value))
)

# --- Applied settings: per-scenario diff vs. base.h5 ----------------------
applied_rows <- list()
for (sname in param_grid$scenario_name) {
  p <- kwb.utils::resolve(path_list, dir_target = sname)
  h5_s <- hdf5r::H5File$new(p$path_target_input, mode = "r")
  scenario_vals <- kwb.raindrop::h5_read_values(h5_s)
  h5_s$close_all()
  sheets[[sname]] <- vals_to_tibble(scenario_vals)

  for (k in union(names(base_vals), names(scenario_vals))) {
    if (!vals_equal(base_vals[[k]], scenario_vals[[k]])) {
      applied_rows[[length(applied_rows) + 1L]] <- tibble::tibble(
        scenario       = sname,
        parameter      = k,
        base_value     = format_h5_value(base_vals[[k]]),
        scenario_value = format_h5_value(scenario_vals[[k]])
      )
    }
  }
}
sheets$applied_settings <- if (length(applied_rows) > 0L) {
  dplyr::bind_rows(applied_rows) |>
    dplyr::arrange(parameter, scenario)
} else {
  tibble::tibble(scenario = character(),
                 parameter = character(),
                 base_value = character(),
                 scenario_value = character())
}

# Re-order: base + timeseries_info + applied_settings + per-scenario sheets.
sheets <- c(sheets[c("base", "timeseries_info", "applied_settings")],
            sheets[param_grid$scenario_name])

xlsx_path <- file.path(tempdir(), "raindrop_wien_minimal_params.xlsx")
writexl::write_xlsx(sheets, path = xlsx_path)
message("Wrote XLSX (", length(sheets),
        " sheets: base + timeseries_info + applied_settings + ",
        nrow(param_grid), " scenarios) to:\n  ",
        xlsx_path)
```

### Results

```{r analyse_results, eval = data_available && is_windows}
simulation_results <- kwb.raindrop::get_simulation_results_optim_parallel(
  paths = paths,
  path_list = path_list,
  simulation_names = param_grid$scenario_name,
  debug = FALSE
)

# get_simulation_results_optim_parallel() now returns NULL only when the
# element H5 (Mulde_Rigole.h5) is missing; a missing connected-area H5
# (Dach.h5) yields a partial result with connected_area = NULL. Both cases
# are absorbed by add_overflow_events_and_waterbalance(), which emits a
# warning and fills the corresponding columns with NA.
# canonical_variables provides a fallback column template so the rendered
# datatable still exposes element.WB_*_ and connectedarea.WB_*_ columns
# even when every scenario fails (e.g. the engine crashes for all 12
# scenarios with a given override combination, leaving simulation_results
# as a list of NULLs).
simulation_results_optimisation <- kwb.raindrop::add_overflow_events_and_waterbalance(
  simulation_results = simulation_results,
  event_separation_hours = 4,
  canonical_variables = kwb.raindrop::default_canonical_wb_variables()
)

results <- param_grid |>
  dplyr::left_join(simulation_results_optimisation, by = c("scenario_name" = "s_name")) |>
  dplyr::relocate(scenario_name, .before = connected_area)

# Attach construction-cost columns (storage layer assumed infiltration box;
# switch to gravel trench via storage_type = "gravel_trench" or a per-row
# storage_type column in param_grid).
results <- kwb.raindrop::compute_costs(results)

DT::datatable(
  results,
  filter = "top",
  options = list(pageLength = 12, autoWidth = TRUE),
  caption = "Twelve-scenario simulation results — water balance + overflow events + construction costs (EUR). Sort by the ET share column to see which combination of `keineVerdunstungBeiRegen` / `Hoernschemeyer_aktiv` / `ET0ref_factor` brings the modelled ET share closest to the SWIMM-Urban-Eva reference of ~7%. Scenarios whose water-balance step errored (see chunk log) are present in the table but have NA in the result columns."
)
```

The `cost_*` columns enable cost-aware optimisation: filter or sort the
table by `cost_total` to find the cheapest scenario that still meets a
hydraulic target (low overflow count, high evapotranspiration share, ...).
Switching the storage layer between **infiltration box**
(Austrian "Sickerbox", ≈ 95 % porosity, 350 EUR/m³) and **gravel trench**
(Austrian "Schotterrigol", ≈ 30 % porosity, 50 EUR/m³) trades storage
volume against cost; pass `storage_type = "gravel_trench"` to
`compute_costs()` (or add a `storage_type` column to `param_grid`) to
explore that trade-off.