| Title: | R Package with Functions for Scraping Data of Wasserportal Berlin |
|---|---|
| Description: | R Package with Functions for Scraping Data of Wasserportal Berlin (https://wasserportal.berlin.de), which contains real-time data of surface water and groundwater monitoring stations. |
| Authors: | Hauke Sonnenberg [aut] (ORCID: <https://orcid.org/0000-0001-9134-2871>), Michael Rustler [aut, cre] (ORCID: <https://orcid.org/0000-0003-0647-7726>), AD4GD [fnd], DWC [fnd], IMPETUS [fnd], PROMISCES [fnd], Kompetenzzentrum Wasser Berlin gGmbH (KWB) [cph] |
| Maintainer: | Michael Rustler <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.7.0 |
| Built: | 2026-06-19 14:11:56 UTC |
| Source: | https://github.com/KWB-R/wasserportal |
Helper function: base url for download
base_url_download()base_url_download()
base url for download of csv/zip files prepared by R package
Create Text Labels from Data Frame Columns
columns_to_labels(data, columns, fmt = "%s: %s", sep = ", ")columns_to_labels(data, columns, fmt = "%s: %s", sep = ", ")
data |
data frame |
columns |
names of columns from which to create labels |
fmt |
format string passed to |
sep |
separator (default: ", ") |
vector of character with as many elements as there are rows in data
data <- data.frame(number = 1:2, name = c("adam", "eva"), value = 3:4) columns <- c("name", "value") columns_to_labels(data, columns) columns_to_labels(data, columns, fmt = "<p>%s: %s</p>", sep = "")data <- data.frame(number = 1:2, name = c("adam", "eva"), value = 3:4) columns <- c("name", "value") columns_to_labels(data, columns) columns_to_labels(data, columns, fmt = "<p>%s: %s</p>", sep = "")
The tables that appear in the API documentation of the wasserportal (https://wasserportal.berlin.de/download/wasserportal_berlin_getting_data.pdf) have been added to the wasserportal package. This function returns a list of data frames with each element representing one of these tables.
get_api_tables(name = NULL)get_api_tables(name = NULL)
name |
of element from the list of data frames to be selected. If this argument is left blank (name = NULL), the default, the list of data frames is returned. |
list of data frames or data frame specified by the name argument
get_api_tables()get_api_tables()
Get Daily Surfacewater Data: wrapper to scrape daily surface water data
get_daily_surfacewater_data( stations, variables = get_surfacewater_variables(), list2df = FALSE )get_daily_surfacewater_data( stations, variables = get_surfacewater_variables(), list2df = FALSE )
stations |
stations as retrieved by by |
variables |
variables as retrieved by by |
list2df |
convert result list to data frame (default: FALSE) |
list or data frame with all available data from Wasserportal
## Not run: stations <- wasserportal::get_stations() variables <- wasserportal::get_surfacewater_variables() variables sw_data_daily <- wasserportal::get_daily_surfacewater_data(stations, variables) ## End(Not run)## Not run: stations <- wasserportal::get_stations() variables <- wasserportal::get_surfacewater_variables() variables sw_data_daily <- wasserportal::get_daily_surfacewater_data(stations, variables) ## End(Not run)
wrapper function to scrape all available raw data, i.e. groundwater level and quality data and save in list
get_groundwater_data( stations, groundwater_options = get_groundwater_options(), debug = TRUE, stations_list = NULL )get_groundwater_data( stations, groundwater_options = get_groundwater_options(), debug = TRUE, stations_list = NULL )
stations |
list as retrieved by |
groundwater_options |
as retrieved by |
debug |
print debug messages (default: TRUE) |
stations_list |
list of station metadata as returned by
|
list with elements "groundwater.level" and "groundwater.quality" data frames
## Not run: stations <- wasserportal::get_stations() gw_data_list <- get_groundwater_data(stations) str(gw_data_list) ## End(Not run)## Not run: stations <- wasserportal::get_stations() gw_data_list <- get_groundwater_data(stations) str(gw_data_list) ## End(Not run)
Helper function: get groundwater options
get_groundwater_options()get_groundwater_options()
return available groundwater data options and prepare for being used
as input for get_groundwater_data
get_groundwater_options()get_groundwater_options()
Wasserportal Berlin: get overview options for stations
get_overview_options()get_overview_options()
list with shortcuts to station overview tables
(wasserportal.berlin.de/messwerte.php?anzeige=tabelle&thema=<shortcut>)
get_overview_options()get_overview_options()
Helper function: get available station variables
get_station_variables(station_df)get_station_variables(station_df)
station_df |
data frame with one row per station and columns "Messstellennummer", "Messstellenname" and additional columns each of which represents a variable that is measured at that station. If the variable columns contain the value "x" it means that the corresponding variable is measured and the name of the column is contained in the returned vector of variable names. |
returns names of available variables for station
Get Stations
get_stations( type = c("list", "data.frame", "crosstable"), run_parallel = TRUE, n_cores = parallel::detectCores() - 1L, debug = TRUE )get_stations( type = c("list", "data.frame", "crosstable"), run_parallel = TRUE, n_cores = parallel::detectCores() - 1L, debug = TRUE )
type |
vector of character describing the type(s) of output(s) to be
returned. Expected values (and default): |
run_parallel |
default: TRUE |
n_cores |
number of cores to use if |
debug |
logical indicating whether or not to show debug messages |
list with general station "overview" (either as list "overview_list" or as data.frame "overview_df") and a crosstable with information which parameters is available per station ("x" if available, NA if not)
stations <- wasserportal::get_stations(n_cores = 2L) str(stations)stations <- wasserportal::get_stations(n_cores = 2L) str(stations)
Get Surface Water Quality for Multiple Monitoring Stations
get_surfacewater_qualities(station_ids, dbg = TRUE)get_surfacewater_qualities(station_ids, dbg = TRUE)
station_ids |
vector with ids of multiple (or one) monitoring stations |
dbg |
print debug messages (default: TRUE) |
data frame with water quality data for multiple monitoring stations
## Not run: stations <- wasserportal::get_stations() station_ids <- stations$overview_list$surface_water.quality$Messstellennummer swq <- wasserportal::get_surfacewater_qualities(station_ids) str(swq) ## End(Not run)## Not run: stations <- wasserportal::get_stations() station_ids <- stations$overview_list$surface_water.quality$Messstellennummer swq <- wasserportal::get_surfacewater_qualities(station_ids) str(swq) ## End(Not run)
Get Surface Water Quality for One Monitoring Station
get_surfacewater_quality(station_id)get_surfacewater_quality(station_id)
station_id |
id of surface water measurement station |
data frame with water quality data for one monitoring station
## Not run: stations <- wasserportal::get_stations() station_id <- stations$overview_list$surface_water.quality$Messstellennummer[1] swq <- wasserportal::get_surfacewater_quality(station_id) str(swq) ## End(Not run)## Not run: stations <- wasserportal::get_stations() station_id <- stations$overview_list$surface_water.quality$Messstellennummer[1] swq <- wasserportal::get_surfacewater_quality(station_id) str(swq) ## End(Not run)
Helper function: get surface water variables
get_surfacewater_variables()get_surfacewater_variables()
vector with surface water variables
Wasserportal Berlin: get master data for a single station
get_wasserportal_master_data(master_url)get_wasserportal_master_data(master_url)
master_url |
url with master data for single station as retrieved by
|
data frame with metadata for selected station
## Not run: stations_list <- wasserportal::get_stations(type = "list") # GW Station master_url <- stations_list %>% kwb.utils::selectElements("groundwater.level") %>% kwb.utils::selectColumns("stammdaten_link")[1L] get_wasserportal_master_data(master_url) # SW Station # Reduce to monitoring stations maintained by Berlin master_urls <- stations_list %>% kwb.utils::selectElements("surface_water.water_level") %>% dplyr::filter(.data$Betreiber == "Land Berlin") %>% dplyr::pull(.data$stammdaten_link) get_wasserportal_master_data(master_urls[1L]) ## End(Not run)## Not run: stations_list <- wasserportal::get_stations(type = "list") # GW Station master_url <- stations_list %>% kwb.utils::selectElements("groundwater.level") %>% kwb.utils::selectColumns("stammdaten_link")[1L] get_wasserportal_master_data(master_url) # SW Station # Reduce to monitoring stations maintained by Berlin master_urls <- stations_list %>% kwb.utils::selectElements("surface_water.water_level") %>% dplyr::filter(.data$Betreiber == "Land Berlin") %>% dplyr::pull(.data$stammdaten_link) get_wasserportal_master_data(master_urls[1L]) ## End(Not run)
Wasserportal Berlin: get master data for a multiple stations
get_wasserportal_masters_data(master_urls, run_parallel = TRUE)get_wasserportal_masters_data(master_urls, run_parallel = TRUE)
master_urls |
URLs to master data as found in column "stammdaten_link"
of the data frame returned by
|
run_parallel |
default: TRUE |
data frame with metadata for selected master urls
## Not run: stations_list <- wasserportal::get_stations(type = "list") # Reduce to monitoring stations maintained by Berlin master_urls <- stations_list$surface_water.water_level %>% dplyr::filter(.data$Betreiber == "Land Berlin") %>% dplyr::pull(.data$stammdaten_link) system.time(master_parallel <- get_wasserportal_masters_data( master_urls )) system.time(master_sequential <- get_wasserportal_masters_data( master_urls, run_parallel = FALSE )) ## End(Not run)## Not run: stations_list <- wasserportal::get_stations(type = "list") # Reduce to monitoring stations maintained by Berlin master_urls <- stations_list$surface_water.water_level %>% dplyr::filter(.data$Betreiber == "Land Berlin") %>% dplyr::pull(.data$stammdaten_link) system.time(master_parallel <- get_wasserportal_masters_data( master_urls )) system.time(master_sequential <- get_wasserportal_masters_data( master_urls, run_parallel = FALSE )) ## End(Not run)
Get Names and IDs of the Stations of wasserportal.berlin.de
get_wasserportal_stations(type = "quality")get_wasserportal_stations(type = "quality")
type |
one of "quality", "level", "flow" |
Wasserportal Berlin: get stations overview table
get_wasserportal_stations_table( type = get_overview_options()$groundwater$level, url_wasserportal = wasserportal_base_url() )get_wasserportal_stations_table( type = get_overview_options()$groundwater$level, url_wasserportal = wasserportal_base_url() )
type |
type of stations table to retrieve. Valid options defined in
|
url_wasserportal |
base url to Wasserportal berlin (default:
|
data frame with master data of selected monitoring stations
types <- wasserportal::get_overview_options() str(types) sw_l <- wasserportal::get_wasserportal_stations_table(type = types$surface_water$water_level) str(sw_l)types <- wasserportal::get_overview_options() str(types) sw_l <- wasserportal::get_wasserportal_stations_table(type = types$surface_water$water_level) str(sw_l)
Get Names and IDs of the Variables of wasserportal.berlin.de
get_wasserportal_variables(station = NULL)get_wasserportal_variables(station = NULL)
station |
station id. If given, only variables that are available for the given station are returned. |
Convenience helper for local debugging of the daily ZIP artefacts
published at https://kwb-r.github.io/wasserportal. Downloads each ZIP,
extracts the CSV, reads it with readr::read_csv() and prints a short
summary (columns, row count, unique Messstellennummer count, head of
the data). The intersection of Messstellennummer values across all
loaded files is reported at the end so you can quickly see how many
stations have measurements in every file.
inspect_gh_pages_zips( files = c("groundwater_level.zip", "groundwater_quality.zip"), base_url = "https://kwb-r.github.io/wasserportal", destdir = tempfile("wasserportal-inspect-"), head_rows = 5L )inspect_gh_pages_zips( files = c("groundwater_level.zip", "groundwater_quality.zip"), base_url = "https://kwb-r.github.io/wasserportal", destdir = tempfile("wasserportal-inspect-"), head_rows = 5L )
files |
character vector of ZIP file names hosted under
|
base_url |
base URL where the ZIPs are hosted, without trailing
slash. Default: |
destdir |
directory used to download and extract the ZIPs. Default is a fresh tempdir; pass an explicit path to keep the unpacked CSVs around for further inspection. |
head_rows |
number of rows to print from the top of every loaded data frame. Default 5. |
Returns the loaded data frames invisibly so the caller can further
inspect them in R, e.g. dat$groundwater_level$Parameter |> table().
invisibly a named list of tibbles, one per input file. Names
are derived from the ZIP basename without the extension.
## Not run: # default: groundwater level + groundwater quality dat <- inspect_gh_pages_zips() # any ZIPs you want to inspect: dat <- inspect_gh_pages_zips(files = c( "daily_surface-water_water-level.zip", "daily_surface-water_temperature.zip" )) # keep the extracted CSVs: dat <- inspect_gh_pages_zips(destdir = "~/tmp/wasserportal-inspect") ## End(Not run)## Not run: # default: groundwater level + groundwater quality dat <- inspect_gh_pages_zips() # any ZIPs you want to inspect: dat <- inspect_gh_pages_zips(files = c( "daily_surface-water_water-level.zip", "daily_surface-water_temperature.zip" )) # keep the extracted CSVs: dat <- inspect_gh_pages_zips(destdir = "~/tmp/wasserportal-inspect") ## End(Not run)
Helper function: list data to csv or zip
list_data_to_csv_or_zip(data_list, file_prefix, to_zip)list_data_to_csv_or_zip(data_list, file_prefix, to_zip)
data_list |
data in list form |
file_prefix |
file prefix |
to_zip |
whether or not to convert to zip file |
loops through list of data frames and uses list names as filenames
Helper function: list masters data to csv
list_masters_data_to_csv(masters_data_list)list_masters_data_to_csv(masters_data_list)
masters_data_list |
masters data in list form as retrieved by
|
loops through list of data frames and uses list names as filenames
## Not run: stations_list <- get_stations(type = "list") masters_data_csv_files <- list_masters_data_to_csv(stations_list) masters_data_csv_files ## End(Not run)## Not run: stations_list <- get_stations(type = "list") masters_data_csv_files <- list_masters_data_to_csv(stations_list) masters_data_csv_files ## End(Not run)
Helper function: list timeseries data to zip
list_timeseries_data_to_zip(timeseries_data_list)list_timeseries_data_to_zip(timeseries_data_list)
timeseries_data_list |
time series data in list form as retrieved by
|
loops through list of data frames and uses list names as filenames
## Not run: stations <- wasserportal::get_stations() # Groundwater Time Series gw_tsdata_list <- wasserportal::get_groundwater_data(stations) gw_tsdata_files <- wasserportal::list_timeseries_data_to_zip(gw_tsdata_list) # Surface Water Time Series sw_tsdata_list <- wasserportal::get_daily_surfacewater_data(stations) sw_tsdata_files <- wasserportal::list_timeseries_data_to_zip(sw_tsdata_list) ## End(Not run)## Not run: stations <- wasserportal::get_stations() # Groundwater Time Series gw_tsdata_list <- wasserportal::get_groundwater_data(stations) gw_tsdata_files <- wasserportal::list_timeseries_data_to_zip(gw_tsdata_list) # Surface Water Time Series sw_tsdata_list <- wasserportal::get_daily_surfacewater_data(stations) sw_tsdata_files <- wasserportal::list_timeseries_data_to_zip(sw_tsdata_list) ## End(Not run)
Helper function to read CSV
read(text, ...)read(text, ...)
text |
text |
... |
... additional arguments passed to |
data frame with values
This function downloads and reads CSV files from wasserportal.berlin.de.
read_wasserportal( station, variables = NULL, from_date = as.character(Sys.Date() - 90L), type = "single", include_raw_time = FALSE, stations_crosstable )read_wasserportal( station, variables = NULL, from_date = as.character(Sys.Date() - 90L), type = "single", include_raw_time = FALSE, stations_crosstable )
station |
station number, as found in column "Messstellennummer" of the
data frame returned by |
variables |
vector of variable identifiers, as returned by
|
from_date |
|
type |
one of "single" (the default), "daily", "monthly" |
include_raw_time |
if |
stations_crosstable |
data frame as returned by
|
The original timestamps (column timestamps_raw in the example below)
are not all plausible, e.g. "31.03.2019 03:00" appears twice! They are
corrected (column timestamp_corr) to represent a plausible sequence of
timestamps in Berlin Normal Time (UTC+01) Finally, a valid POSIXct timestamp
in timezone "Berlin/Europe" (UTC+01 in winter, UTC+02 in summer) is created,
together with the additional information on the UTC offset (column
UTCOffset, 1 in winter, 2 in summer).
data frame read from the CSV file that the download provides. IMPORTANT: It is not yet clear how to interpret the timestamp, see example
## Not run: # Get a list of available water quality stations and variables stations_crosstable <- wasserportal::get_stations(type = "crosstable") # Set the start date from_date <- "2021-03-01" # Read the timeseries (multiple variables for one station) water_quality <- wasserportal::read_wasserportal( station = stations_crosstable$Messstellennummer[1L], from_date = from_date, include_raw_time = TRUE, stations_crosstable = stations_crosstable ) # Look at the first few records head(water_quality) # Check the metadata #kwb.utils::getAttribute(water_quality, "metadata") # Set missing values to NA water_quality[water_quality == -777] <- NA # Look at the first few records again head(water_quality) ### How was the original timestamp interpreted? # Determine the days at which summer time starts and ends, respectively from_year <- as.integer(substr(from_date, 1L, 4L)) switches <- kwb.datetime::date_range_CEST(from_year) # Reformat to dd.mm.yyyy switches <- kwb.datetime::reformatTimestamp(switches, "%Y-%m-%d", "%d.%m.%Y") # Define a pattern to look for timestamps "around" the switches pattern <- paste(switches, "0[1-4]", collapse = "|") # Look at the data for these timestamps water_quality[grepl(pattern, water_quality$timestamp_raw), ] # The original timestamps (timestamps_raw) were not all plausible, e.g. # for March 2019. This seems to have been fixed by the "wasserportal"! sum(water_quality$timestamp_raw != water_quality$timestamp_corr) ## End(Not run)## Not run: # Get a list of available water quality stations and variables stations_crosstable <- wasserportal::get_stations(type = "crosstable") # Set the start date from_date <- "2021-03-01" # Read the timeseries (multiple variables for one station) water_quality <- wasserportal::read_wasserportal( station = stations_crosstable$Messstellennummer[1L], from_date = from_date, include_raw_time = TRUE, stations_crosstable = stations_crosstable ) # Look at the first few records head(water_quality) # Check the metadata #kwb.utils::getAttribute(water_quality, "metadata") # Set missing values to NA water_quality[water_quality == -777] <- NA # Look at the first few records again head(water_quality) ### How was the original timestamp interpreted? # Determine the days at which summer time starts and ends, respectively from_year <- as.integer(substr(from_date, 1L, 4L)) switches <- kwb.datetime::date_range_CEST(from_year) # Reformat to dd.mm.yyyy switches <- kwb.datetime::reformatTimestamp(switches, "%Y-%m-%d", "%d.%m.%Y") # Define a pattern to look for timestamps "around" the switches pattern <- paste(switches, "0[1-4]", collapse = "|") # Look at the data for these timestamps water_quality[grepl(pattern, water_quality$timestamp_raw), ] # The original timestamps (timestamps_raw) were not all plausible, e.g. # for March 2019. This seems to have been fixed by the "wasserportal"! sum(water_quality$timestamp_raw != water_quality$timestamp_corr) ## End(Not run)
Read Wasserportal Raw
read_wasserportal_raw( variable, station, from_date, type = "single", include_raw_time = FALSE, handle = NULL, stations_crosstable, api_version = 2L )read_wasserportal_raw( variable, station, from_date, type = "single", include_raw_time = FALSE, handle = NULL, stations_crosstable, api_version = 2L )
variable |
variable |
station |
station id |
from_date |
start date |
type |
one of "single", "daily", "monthly" (default: "single") |
include_raw_time |
TRUE or FALSE (default: FALSE) |
handle |
handle (default: NULL) |
stations_crosstable |
data frame as returned by
|
api_version |
1 integer number representing the version of wasserportal's API. 1L: before 2023, 2L: since 2023. Default: 2L |
????
read_wasserportal_raw_gw
read_wasserportal_raw_gw( station = 149, stype = "gws", type = "single_all", from_date = "", include_raw_time = FALSE, handle = NULL, as_text = FALSE, dbg = FALSE )read_wasserportal_raw_gw( station = 149, stype = "gws", type = "single_all", from_date = "", include_raw_time = FALSE, handle = NULL, as_text = FALSE, dbg = FALSE )
station |
station id |
stype |
"gws" or "gwq" |
type |
"single" or "single_all" (if stype = "gwq") |
from_date |
(default: "") |
include_raw_time |
default: FALSE |
handle |
default: NULL |
as_text |
if TRUE, the raw text that is returned by the HTTP request to the Wasserportal is returned by this function. Otherwise (the default) the raw text is tried to be interpreted as comma separated values and a corresponding data frame is returned. Use as_text = TRUE to analyse the raw text in case that an error occurs when trying to convert the text to a data frame. |
dbg |
logical indicating whether or not to show debug messages. The default is FALSE |
data.frame with values
## Not run: read_wasserportal_raw_gw(station = 149, stype = "gws") read_wasserportal_raw_gw(station = 149, stype = "gwq") ## End(Not run)## Not run: read_wasserportal_raw_gw(station = 149, stype = "gws") read_wasserportal_raw_gw(station = 149, stype = "gwq") ## End(Not run)
Read CSV File from Package's "extdata" Folder
readPackageFile(file, ...)readPackageFile(file, ...)
file |
file name (without path) |
... |
additional arguments passed to |
data frame representing the content of file
Wipes historical telemetry rows from ThingsBoard for the given device and
keys via DELETE /api/plugins/telemetry/DEVICE/{id}/timeseries/delete.
Pass keys = NULL (the default) to clear every key the device currently
knows – the function then calls
tb_list_device_telemetry_keys first to discover them.
tb_delete_device_telemetry( device_id, keys = NULL, api_key = Sys.getenv("TB_API_KEY"), host = tb_default_host(), delete_latest = TRUE, username = Sys.getenv("TB_USERNAME"), password = Sys.getenv("TB_PASSWORD") )tb_delete_device_telemetry( device_id, keys = NULL, api_key = Sys.getenv("TB_API_KEY"), host = tb_default_host(), delete_latest = TRUE, username = Sys.getenv("TB_USERNAME"), password = Sys.getenv("TB_PASSWORD") )
device_id |
device UUID. |
keys |
character vector of telemetry keys to delete, or |
api_key |
account-level API key. Defaults to env var |
host |
base URL of the ThingsBoard instance. Defaults to env var
|
delete_latest |
if |
username |
ThingsBoard user for the username/password (JWT) login
(self-hosted / Community Edition). Defaults to env var |
password |
ThingsBoard password. Defaults to env var |
Server-side attributes (latitude, longitude, Bezirk, ...) and the device itself are NOT touched, only the time-series telemetry store. Re-running the demo push afterwards re-fills the cleared keys with the real Wasserportal timestamps.
invisibly the number of keys submitted for deletion.
## Not run: id <- tb_get_device_id("wasserportal-gw-6038") # wipe everything currently stored: tb_delete_device_telemetry(id) # wipe just the smoke-test GW-Stand value: tb_delete_device_telemetry(id, keys = "GW-Stand") ## End(Not run)## Not run: id <- tb_get_device_id("wasserportal-gw-6038") # wipe everything currently stored: tb_delete_device_telemetry(id) # wipe just the smoke-test GW-Stand value: tb_delete_device_telemetry(id, keys = "GW-Stand") ## End(Not run)
Lightweight read-only companion to tb_setup_devices: when you
only need a device's internal UUID (e.g. to call the telemetry-delete
endpoint), this returns it directly without touching the access token or
creating the device on the side. Returns NA_character_ when the device
does not exist.
tb_get_device_id( device_name, api_key = Sys.getenv("TB_API_KEY"), host = tb_default_host(), username = Sys.getenv("TB_USERNAME"), password = Sys.getenv("TB_PASSWORD") )tb_get_device_id( device_name, api_key = Sys.getenv("TB_API_KEY"), host = tb_default_host(), username = Sys.getenv("TB_USERNAME"), password = Sys.getenv("TB_PASSWORD") )
device_name |
device name as shown in the ThingsBoard UI. |
api_key |
account-level API key. Defaults to env var |
host |
base URL of the ThingsBoard instance. Defaults to env var
|
username |
ThingsBoard user for the username/password (JWT) login
(self-hosted / Community Edition). Defaults to env var |
password |
ThingsBoard password. Defaults to env var |
device UUID (character) or NA_character_ if the lookup did not
resolve.
## Not run: tb_get_device_id("wasserportal-gw-6038") ## End(Not run)## Not run: tb_get_device_id("wasserportal-gw-6038") ## End(Not run)
Wraps GET /api/plugins/telemetry/DEVICE/{id}/keys/timeseries. Useful to
discover what's actually in the device-side time-series store before a
wipe, or to compare against the Parameter column of the gh-pages
source data.
tb_list_device_telemetry_keys( device_id, api_key = Sys.getenv("TB_API_KEY"), host = tb_default_host(), username = Sys.getenv("TB_USERNAME"), password = Sys.getenv("TB_PASSWORD") )tb_list_device_telemetry_keys( device_id, api_key = Sys.getenv("TB_API_KEY"), host = tb_default_host(), username = Sys.getenv("TB_USERNAME"), password = Sys.getenv("TB_PASSWORD") )
device_id |
device UUID. Use |
api_key |
account-level API key. Defaults to env var |
host |
base URL of the ThingsBoard instance. Defaults to env var
|
username |
ThingsBoard user for the username/password (JWT) login
(self-hosted / Community Edition). Defaults to env var |
password |
ThingsBoard password. Defaults to env var |
character vector of telemetry key names. May be of length 0.
## Not run: id <- tb_get_device_id("wasserportal-gw-6038") tb_list_device_telemetry_keys(id) ## End(Not run)## Not run: id <- tb_get_device_id("wasserportal-gw-6038") tb_list_device_telemetry_keys(id) ## End(Not run)
Calls POST /api/auth/login with a username/password pair and returns the
JWT access token. This is the standard ThingsBoard REST API
authentication and the only one available on self-hosted ThingsBoard
Community Edition: unlike the account-level API key (a ThingsBoard Cloud
convenience, generated under Account > Security > API keys), every
edition – CE, PE and Cloud – accepts a username/password login.
tb_login( username = Sys.getenv("TB_USERNAME"), password = Sys.getenv("TB_PASSWORD"), host = tb_default_host() )tb_login( username = Sys.getenv("TB_USERNAME"), password = Sys.getenv("TB_PASSWORD"), host = tb_default_host() )
username |
ThingsBoard user (usually the account e-mail). Defaults to
env var |
password |
ThingsBoard password. Defaults to env var |
host |
base URL of the ThingsBoard instance, without trailing slash.
Defaults to env var |
The token is short-lived (ThingsBoard's default JWT expiration is 2.5 h),
which is ample for the one-off device setup done by
tb_setup_devices: the subsequent telemetry push uses the
per-device access token, not this JWT, so no token refresh is implemented.
Transient failures (HTTP 408 / 429 / 500 / 502 / 503 / 504 and transport dropouts) are retried with exponential backoff, matching the predicate used for the telemetry POSTs so a flaky upstream does not abort the device-setup run on the first 5xx.
the JWT access token as a character scalar, ready to be sent in an
X-Authorization: Bearer <token> request header.
On a non-2xx response this helper surfaces an excerpt of the server's
response body (via tb_error_body(), up to ~800 chars) in the R error
message, and httr2::req_retry() prints retry messages to stderr.
Stock ThingsBoard only echoes back the error description, not the
request payload, so the password does not leak. If a self-hosted
instance or reverse proxy is configured to echo request fields back in
the error body, that excerpt would surface in R errors and – when
captured with 2>&1 – in CI logs. Mask the relevant secrets in such
environments.
## Not run: Sys.setenv( TB_HOST = "https://dashboards.inowas.org", TB_USERNAME = "[email protected]", TB_PASSWORD = "secret" ) token <- tb_login() ## End(Not run)## Not run: Sys.setenv( TB_HOST = "https://dashboards.inowas.org", TB_USERNAME = "[email protected]", TB_PASSWORD = "secret" ) token <- tb_login() ## End(Not run)
Wraps the per-device transport rate limits documented at
https://thingsboard.io/docs/paas/eu/subscriptions/ into the
parameters this package's push functions take. Pass the result into
tb_push_station_telemetry() via mode, chunk_size and
throttle_seconds so you stay below your plan's
"Telemetry Transport messages/data points (Device)" thresholds.
tb_plan_defaults(plan = "free")tb_plan_defaults(plan = "free")
plan |
one of
Case-insensitive. Unknown values raise an error. |
Across all paid PaaS tiers the per-device sustained limits are
identical (2 000 telemetry data points per minute, 15 000 per hour),
the only thing that changes is how aggressive a burst the platform
tolerates before it drops a request. Free additionally rejects the
bulk array form on the device telemetry endpoint, so its default is
mode = "single".
Self-hosted ThingsBoard CE has no per-tenant rate limit by default, hence the much larger chunk size and zero throttle.
named list with mode, chunk_size and throttle_seconds,
ready to be spread into tb_push_station_telemetry().
tb_plan_defaults("free") tb_plan_defaults("free-bulk") tb_plan_defaults("ce")tb_plan_defaults("free") tb_plan_defaults("free-bulk") tb_plan_defaults("ce")
Sends a flat {"key": value, ...} JSON object to the ThingsBoard
telemetry endpoint, letting the server stamp it with the current time.
This is the simplest possible telemetry POST and is useful both as a
smoke test for the device-token auth path and as a fallback when the
bulk-with-ts format is rejected (some ThingsBoard Cloud Maker tier
configurations return an opaque HTTP 500 to the array-of-records form
even though the same device accepts attributes and latest-style
single records).
tb_push_latest_telemetry( values, device_token, host = tb_default_host() )tb_push_latest_telemetry( values, device_token, host = tb_default_host() )
values |
named list (or named numeric vector) of telemetry key/value pairs. |
device_token |
ThingsBoard device access token. |
host |
base URL of the ThingsBoard instance. Defaults to env var
|
invisibly the number of keys that were sent.
## Not run: tb_push_latest_telemetry( values = list(`GW-Stand` = 35.6, `Wassertemperatur` = 11.2), device_token = Sys.getenv("TB_DEVICE_TOKEN") ) ## End(Not run)## Not run: tb_push_latest_telemetry( values = list(`GW-Stand` = 35.6, `Wassertemperatur` = 11.2), device_token = Sys.getenv("TB_DEVICE_TOKEN") ) ## End(Not run)
Sends station metadata (coordinates, level reference, operator, ...) as
client-side attributes to the ThingsBoard device attributes endpoint
(/api/v1/{token}/attributes). Attributes are key/value pairs without a
timestamp; ThingsBoard overwrites the previous value on every push.
tb_push_station_attributes( attributes, device_token, host = tb_default_host() )tb_push_station_attributes( attributes, device_token, host = tb_default_host() )
attributes |
named list (or a one-row data frame, which is converted to a list). All entries must be JSON-serialisable scalars. |
device_token |
ThingsBoard device access token. |
host |
base URL of the ThingsBoard instance. Defaults to env var
|
invisibly the number of attributes that were sent.
## Not run: tb_push_station_attributes( attributes = list( name = "Pegel Mueggelheim", latitude = 52.4291, longitude = 13.6450, pegelnullpunkt_m_NHN = 32.18 ), device_token = Sys.getenv("TB_DEVICE_TOKEN_5867000") ) ## End(Not run)## Not run: tb_push_station_attributes( attributes = list( name = "Pegel Mueggelheim", latitude = 52.4291, longitude = 13.6450, pegelnullpunkt_m_NHN = 32.18 ), device_token = Sys.getenv("TB_DEVICE_TOKEN_5867000") ) ## End(Not run)
Sends long-format measurements of a single Wasserportal monitoring station
to the device telemetry endpoint of a ThingsBoard instance
(/api/v1/{token}/telemetry). Works against ThingsBoard Cloud
(e.g. the free Maker tier on https://thingsboard.cloud),
self-hosted ThingsBoard Community Edition and https://demo.thingsboard.io
since the device-token API is identical on all of them.
tb_push_station_telemetry( data, device_token, ts_col = "Datum", value_col = "Messwert", key_col = "Parameter", single_key = "value", host = tb_default_host(), chunk_size = 100L, mode = c("single", "bulk"), throttle_seconds = NULL, max_active = 10L, verbose = TRUE )tb_push_station_telemetry( data, device_token, ts_col = "Datum", value_col = "Messwert", key_col = "Parameter", single_key = "value", host = tb_default_host(), chunk_size = 100L, mode = c("single", "bulk"), throttle_seconds = NULL, max_active = 10L, verbose = TRUE )
data |
data frame for one station, in the long format produced by
|
device_token |
ThingsBoard device access token (taken from the device detail view in the ThingsBoard UI). |
ts_col |
name of the timestamp column. Default |
value_col |
name of the numeric value column. Default |
key_col |
name of the column whose values become telemetry keys.
Default |
single_key |
telemetry key used when |
host |
base URL of the ThingsBoard instance, without trailing slash.
Defaults to env var |
chunk_size |
maximum number of timestamps per HTTP POST when
|
mode |
one of |
throttle_seconds |
inter-request sleep, in seconds, between
consecutive HTTP POSTs. |
max_active |
number of concurrent HTTP POSTs in single mode
(passed to |
verbose |
print one line per chunk in bulk mode and one line
per parallel batch in single mode (default |
Long-format input is pivoted on the fly so that every distinct value of
key_col becomes one telemetry key inside ThingsBoard, sharing the same
timestamp.
invisibly the number of telemetry timestamps that were sent.
## Not run: stations <- wasserportal::get_stations() gw <- wasserportal::get_groundwater_data(stations) one_station <- dplyr::filter( gw$groundwater.level, .data$Messstellennummer == "149" ) tb_push_station_telemetry( data = one_station, device_token = Sys.getenv("TB_DEVICE_TOKEN_149") ) ## End(Not run)## Not run: stations <- wasserportal::get_stations() gw <- wasserportal::get_groundwater_data(stations) one_station <- dplyr::filter( gw$groundwater.level, .data$Messstellennummer == "149" ) tb_push_station_telemetry( data = one_station, device_token = Sys.getenv("TB_DEVICE_TOKEN_149") ) ## End(Not run)
Convenience wrapper for the initial setup against a fresh ThingsBoard tenant. Authenticates with either a username/password login (JWT – works on every ThingsBoard edition, required for self-hosted Community Edition) or an account-level API key (ThingsBoard Cloud), then:
Create one device per name (or fetch the device if it already exists),
Read each device's access token via the credentials endpoint.
The returned named character vector can be fed directly into
tb_push_station_telemetry as device_token.
tb_setup_devices( station_ids, api_key = Sys.getenv("TB_API_KEY"), host = tb_default_host(), name_prefix = "wasserportal-", device_type = "wasserportal", username = Sys.getenv("TB_USERNAME"), password = Sys.getenv("TB_PASSWORD") )tb_setup_devices( station_ids, api_key = Sys.getenv("TB_API_KEY"), host = tb_default_host(), name_prefix = "wasserportal-", device_type = "wasserportal", username = Sys.getenv("TB_USERNAME"), password = Sys.getenv("TB_PASSWORD") )
station_ids |
character vector of Wasserportal |
api_key |
account-level API key (ThingsBoard Cloud only), generated
under Account > Security > API keys > Generate. Sent in the
|
host |
base URL of the ThingsBoard instance, without trailing slash.
Defaults to env var |
name_prefix |
prefix added in front of every station id when forming
the ThingsBoard device name. Default |
device_type |
ThingsBoard device profile / type. Default
|
username |
ThingsBoard user for the username/password (JWT) login.
Defaults to env var |
password |
ThingsBoard password. Defaults to env var |
Set TB_USERNAME + TB_PASSWORD for the login route, or generate an API
key in the ThingsBoard Cloud UI under
Account > Security > API keys > Generate and set TB_API_KEY.
named character vector. Names are the input station_ids, values
are device access tokens.
## Not run: # Self-hosted ThingsBoard Community Edition (username/password login): Sys.setenv( TB_HOST = "https://dashboards.inowas.org", TB_USERNAME = "[email protected]", TB_PASSWORD = "secret" ) tokens <- tb_setup_devices(c("149", "5867000", "5803900")) # ThingsBoard Cloud (account API key): Sys.setenv( TB_HOST = "https://eu.thingsboard.cloud", TB_API_KEY = "<paste-your-API-key-here>" ) tokens <- tb_setup_devices(c("149", "5867000", "5803900")) ## End(Not run)## Not run: # Self-hosted ThingsBoard Community Edition (username/password login): Sys.setenv( TB_HOST = "https://dashboards.inowas.org", TB_USERNAME = "[email protected]", TB_PASSWORD = "secret" ) tokens <- tb_setup_devices(c("149", "5867000", "5803900")) # ThingsBoard Cloud (account API key): Sys.setenv( TB_HOST = "https://eu.thingsboard.cloud", TB_API_KEY = "<paste-your-API-key-here>" ) tokens <- tb_setup_devices(c("149", "5867000", "5803900")) ## End(Not run)
Helper function: Base Url of Berlin Wassersportal
wasserportal_base_url()wasserportal_base_url()
string with base url of Berlin Wasserportal
Wasserportal Master Data: download and Import in R List
wp_masters_data_to_list( overview_list_names, target_dir = tempdir(), file_prefix = "stations_", is_zipped = FALSE )wp_masters_data_to_list( overview_list_names, target_dir = tempdir(), file_prefix = "stations_", is_zipped = FALSE )
overview_list_names |
names of elements in the list returned by
|
target_dir |
target directory for downloading data (default: tempdir()) |
file_prefix |
prefix given to file names |
is_zipped |
are the data to be downloaded zipped (default: FALSE) |
downloads csv master data from Wasserportal
## Not run: overview_list_names <- names(wasserportal::get_stations(type = "list")) wp_masters_data_list <- wp_masters_data_to_list(overview_list_names) ## End(Not run)## Not run: overview_list_names <- names(wasserportal::get_stations(type = "list")) wp_masters_data_list <- wp_masters_data_to_list(overview_list_names) ## End(Not run)
Wasserportal Time Series Data: download and Import in R List
wp_timeseries_data_to_list( overview_list_names, target_dir = tempdir(), is_zipped = TRUE )wp_timeseries_data_to_list( overview_list_names, target_dir = tempdir(), is_zipped = TRUE )
overview_list_names |
names of elements in the list returned by
|
target_dir |
target directory for downloading data (default: tempdir()) |
is_zipped |
are the data to be downloaded zipped (default: TRUE) |
downloads (zipped) data from wasserportal
## Not run: overview_list_names <- names(wasserportal::get_stations(type = "list")) wp_timeseries_data_list <- wp_timeseries_data_to_list(overview_list_names) ## End(Not run)## Not run: overview_list_names <- names(wasserportal::get_stations(type = "list")) wp_timeseries_data_list <- wp_timeseries_data_to_list(overview_list_names) ## End(Not run)