r-lib/actions/setup-r-dependencies@v2
and r-lib/actions/check-r-package@v2 on ubuntu-latest instead of the
deprecated v2/ubuntu-20.04/r-hub/sysreqs toolchainactions/checkout@v5, actions/upload-artifact@v5) and set
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true so transitive r-lib/actions/*@v2
steps opt into Node 24 as well, ahead of the June 2nd 2026 deprecation of
Node 20 on GitHub Actions runnersclaude.yaml, claude-code-review.yaml)get_wasserportal_master_data(): match the new HTML5 markup of the
master-data table (<caption>Pegel Berlin</caption> instead of the legacy
summary="Pegel Berlin" attribute)windows-1252. The pages declare
UTF-8 in <meta charset> but the server actually emits Latin-1 bytes
(e.g. 0xE4 for ä); trusting the meta declaration left those bytes
mis-marked as UTF-8 and broke subst_special_chars()'s ä→ae /
ü→ue substitutions on Windows Rrvest::html_table() and xml2::xml_text(trim = TRUE) in
get_wasserportal_master_data() and get_wasserportal_stations_table():
both delegate to a sub("^[[:space:] ]+", ...) pass that fails on Windows
R when the cell text contains umlauts. Tables are now extracted directly
via xml2 and trimmed with a locale-safe gsub(..., useBytes = TRUE)
helper (trim_bytes())get_stations() and get_wasserportal_masters_data() resilient when
parallel workers cannot fetch a station overview: load the wasserportal
namespace into the cluster and drop try-error results before
data.table::rbindlist() / dplyr::left_join()wasserportal.berlin.de is
unreachable from the test host (CRAN, sandboxed CI)get_wasserportal_masters_data() test expectations to include the
new Anmerkung column that wasserportal added to surface-water master dataget_surfacewater_qualities()v2, v3, not from
masterget_stations(): add argument n_coresget_wasserportal_stations_table(): Use new (three letter) variable codesread_wasserportal_raw(): adapt request to new API version, add argument
api_versionread_wasserportal_raw_gw(): adapt request to new API versionAdd functions for exporting time series data to zip files (wp_masters_data_to_list())
and master data to csv files (wp_timeseries_data_to_list()), which will be
uploaded to https://kwb-r.github.io/wasserportal/<filename>
In addition import functions for downloading and importing the datasets above
into R as lists were added (list_timeseries_data_to_zip(), list_masters_data_to_csv())
Code cleaning by @hsonne started
Fix master data requests by using the master_url instead of station_id,
as the latter was not unique. Now functions get_wasserportal_master_data() and
it wrapper function get_wasserportal_masters_data() require the master_url
instead of station_id as input parameter. The function get_stations now adds
the column stammdaten_link as additional column for each sublist element of the
sublist overview_list.
Fix to scrape groundwater level data from all available monitoring stations (instead of only 5!) and export to .csv file. In addition switch also to .csv
export for groundwater quality instead of .json due to reduced storage space
(stations_gwq_data.json file is already 98.4 MB large.
Add functions (get_daily_surfacewater_data()) and adapt article
Surface Water for scraping all available daily
surface water data and exporting to one .csv file for each parameter (containing
all monitoring stations)
Deactivate PROMISCES workflows (see wasserportal v0.1.0), due to failing Zenodo download. Will be moved into own R package, most properly kwb.promisces.
get_wasserportal_stations_table() now correctly naming parameter
temperature (formerly incorrectly level)R package for scraping groundwater data (groundwater level and quality) from Wasserportal Berlin. Please note that the
support for scraping surface water monitoring stations is currently very limited!
Functions:
get_stations(): returns metadata for all available monitoring stationsget_wasserportal_masters_data(): get master data for selected station_idsread_wasserportal_raw_gw(): enables the download of groundwater data.
Checkout the Tutorial article how to use it for downloading one or multiple
stations at once.read_wasserportal(): works for surface water monitoring stations, but is
outdated, as it is based on an outdated static file with station/variable names
(i.e. only for 11 instead of 82 stations currently provided!) instead of
relying on new metadata provided online. This will be fixed within the next release. For progress on this issue checkout #21Workflows:
Tutorial article how to download groundwater level and quality data
Further Usage by combining previously scraped (see tutorial above) data and performing some analysis:
Groundwater, e.g. creating a map with GW level trends
Two workflows (REACH UBA, Norman List) created within the project PROMISCES for assessing prevalence and the spatial distribution of persistent, mobile and toxic (PMT) substances in the Berlin groundwater, based on different PMT lists, i.e. REACH UBA or Norman List.
Added a NEWS.md file to track changes to the package.
see https://style.tidyverse.org/news.html for writing a good NEWS.md