Package 'kwb.misa' reference manual

Title:	Functions to be Used in Project MiSa
Description:	Assessment of oxygen course in rivers. Assessment is aimed at reducing critical situations and fish deaths.
Authors:	Malte Zamzow [aut, cre], Kompetenzzentrum Wasser Berlin gGmbH (KWB) [cph]
Maintainer:	Malte Zamzow <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.0
Built:	2024-10-18 03:29:53 UTC
Source:	https://github.com/KWB-R/kwb.misa

Equal time intervals

Description

Transforms the time vector, so that the every timestep is shifted to equal equally distanced points of time

Usage

adjust_time(time_vector, time_interval)
adjust_time(time_vector, time_interval)

Arguments

`time_vector`	A POSIXct vector
`time_interval`	Temporal resolution in seconds

Value

A POSIX vector with equally distanced points of time. The new timesteps start on the hour and

Average data per predefined timesteps

Description

Creates a data frame with fitted time column and corresponding average data values.

Usage

aggregate_measurements(time_vector, data_vector, time_interval = 60 * 15)
aggregate_measurements(time_vector, data_vector, time_interval = 60 * 15)

Arguments

`time_vector`	A POSIXct vector
`data_vector`	A numeric vector, with data corresponding to the time_vector
`time_interval`	Temporal resolution in seconds

Value

Data frame with POSIX time column "t" and data column "d

Force data into predefined time intervals

Description

The measurements are fitted into timesteps defined be the first point of time and a temporal resolution

Usage

continuousTimeIntervals(
  time_vector,
  data_vector,
  res = 15,
  first_pointOfTime = min(time_vector, na.rm = T),
  last_pointOfTime = max(time_vector, na.rm = T)
)
continuousTimeIntervals(
  time_vector,
  data_vector,
  res = 15,
  first_pointOfTime = min(time_vector, na.rm = T),
  last_pointOfTime = max(time_vector, na.rm = T)
)

Arguments

`time_vector`	A POSIXct vector
`data_vector`	A numeric vector, with data corresponding to the time_vector
`res`	Temporal resolution in minutes
`first_pointOfTime`	Starting point (POSIXct) of the newly defined time series. By default the minimum of the time_vector
`last_pointOfTime`	End point (POSIXct) of the newly defined time series. By default the maximum of the time_vector

Details

In a first step a vactor is generated with continuous timesteps, starting at first_pointOfTime by a defined time interval. Subsequently, the measured data is forced into timesteps with a similar time interval. Here, the measurements are assigned to the timestep that is closest to the actual time of measurements. If more than one measurement are assigned to one timestep, the average is used. If there is no measurement, NA is used.

Value

Dataframe with POSIX column "t" and data column "d"

Count Events of deficits

Description

Counts the Number of intervals where x number of data points in a row are below a predifined threshold value. Events are separated by a specified number of data points above that threshod value. Furthermore, the exceedance of a value can also seperation critirion.

Usage

count_def_events(
  data_vector,
  starting_data_points,
  threshold,
  separating_data_points,
  use_recovery_value = FALSE,
  recovery_value = NULL,
  return_event_positions = FALSE
)
count_def_events(
  data_vector,
  starting_data_points,
  threshold,
  separating_data_points,
  use_recovery_value = FALSE,
  recovery_value = NULL,
  return_event_positions = FALSE
)

Arguments

`data_vector`	Numeric vector (with data in the same unit as the tjreshold)
`starting_data_points`	Minimal number of data points to define the beginning of an deficiency event
`threshold`	Numeric in the same unit as the data vector
`separating_data_points`	Minimal number of data points to seperate two events
`use_recovery_value`	If TRUE a recovery, two events are only separated if a revocvery value is exceeded between two deficits
`recovery_value`	Numeric in the same unit as the data vector. Only used if use_recovery_value = TRUE.
`return_event_positions`	Instead the number of events, the events starting and endpositions are returned, correspoding to the data vector

Value

Either a number of events or a data frame with event start and end position

Examples

data_vector <- sin(x = seq(0,50,0.5)) * 1:101/20

a <- count_def_events(
data_vector = data_vector,
starting_data_points = 2,
threshold = 0,
separating_data_points = 4,
use_recovery_value = FALSE,
recovery_value = 7,
return_event_positions = TRUE)

plot(data_vector, pch = 20, type = "b")
rect(xleft = a$tBeg, xright = a$tEnd, ybottom = -10, ytop = 10,
 col = "red", density = 4)

recovery_value <- 3
a <- count_def_events(
data_vector = data_vector,
starting_data_points = 2,
threshold = 0,
separating_data_points = 4,
use_recovery_value = TRUE,
recovery_value = recovery_value,
return_event_positions = TRUE)

plot(data_vector, pch = 20, type = "b")
rect(xleft = a$tBeg[a$start],
 xright = a$tEnd[a$end],
 ybottom = -10, ytop = 10,
 col = "red", density = 4)
 abline(h = recovery_value, col = "blue")


data_vector <- sin(x = seq(0,50,0.5)) * 1:101/20

a <- count_def_events(
data_vector = data_vector,
starting_data_points = 2,
threshold = 0,
separating_data_points = 4,
use_recovery_value = FALSE,
recovery_value = 7,
return_event_positions = TRUE)

plot(data_vector, pch = 20, type = "b")
rect(xleft = a$tBeg, xright = a$tEnd, ybottom = -10, ytop = 10,
 col = "red", density = 4)

recovery_value <- 3
a <- count_def_events(
data_vector = data_vector,
starting_data_points = 2,
threshold = 0,
separating_data_points = 4,
use_recovery_value = TRUE,
recovery_value = recovery_value,
return_event_positions = TRUE)

plot(data_vector, pch = 20, type = "b")
rect(xleft = a$tBeg[a$start],
 xright = a$tEnd[a$end],
 ybottom = -10, ytop = 10,
 col = "red", density = 4)
 abline(h = recovery_value, col = "blue")

Count hours of deficits

Description

Count hours of deficits

Usage

count_def_hours(data_vector, threshold, res)
count_def_hours(data_vector, threshold, res)

Arguments

`data_vector`	Numeric vector (with data in the same unit as the threshold)
`threshold`	Numeric in the same unit as the data vector
`res`	Temporal resolution of data in minutes

Value

A single Value (hours of deficits)

finding_o2Column

Description

This function looks for the oxygen data column in a data frame by column name

Usage

finding_o2Column(dataFrame, tryO2 = c("o2", "oxygen", "ox", "sauerstoff"))
finding_o2Column(dataFrame, tryO2 = c("o2", "oxygen", "ox", "sauerstoff"))

Arguments

`dataFrame`	The data frame where the column is searched
`tryO2`	A vector with patterns possible patterns of columnnames with oxygen data (not case sensitive)

Value

A vector with column numbers of oxygen columns

finding_timestampColumns

Description

This function looks for the timestamp column in a data frame by typical timestamp symbols

Usage

finding_timestampColumns(dataFrame)
finding_timestampColumns(dataFrame)

Arguments

dataFrame

The data frame where the column is searched

Value

A vector with column numbers of timestamp columns

Linear interpolation for one or more missing values

Description

All sections of NA values that are smaller or equal as a defined maximal number of NA's are interpolated

Usage

interpolate_multipleNA(data_vector, max_na)
interpolate_multipleNA(data_vector, max_na)

Arguments

`data_vector`	Numeric vector of measurements (including NA values)
`max_na`	the maximal number of NA values in a row to be interpolated

Value

A list containing the data vector with interpolated NA value as well as an information about the amount of NA's interpolated

Filter MiSa Dataframe

Description

Filters the loaded data frame by sites and time

Usage

misa_filter_data(
  dataFrame,
  sites = "",
  tBeg = min(dataFrame$posixDateTime, na.rm = TRUE),
  tEnd = max(dataFrame$posixDateTime, na.rm = TRUE)
)
misa_filter_data(
  dataFrame,
  sites = "",
  tBeg = min(dataFrame$posixDateTime, na.rm = TRUE),
  tEnd = max(dataFrame$posixDateTime, na.rm = TRUE)
)

Arguments

`dataFrame`	Data frame loaded by a MiSa function (see details)
`sites`	Names of considered sites, written in the site column of a MiSa Dataframe
`tBeg`	POSIX-Value with a start time of the observeration interval
`tEnd`	POSIX-Value with an end time of the observeration interval

Details

The name of the site column must be "site", the name of the timestamp column must be "posixDateTime". The best way is to load the oxygen data with one of the following functions: read_misa_oneSite(),read_misa_multipleSites() or read_misa_files().

Value

A filtered data frame with the same columns as the input data frame

Prepare MiSa Data for MiSa Assessment

Description

Timestamps are adapted, oxygen data is interpolated, it is filtered for summmer months

Usage

misa_prepare_data(df_MiSa, res = 15, max_na_interpolation = 60/res)
misa_prepare_data(df_MiSa, res = 15, max_na_interpolation = 60/res)

Arguments

`df_MiSa`	Data frame loaded with one of the MiSa Load functions
`res`	Temporal resolution in minutes
`max_na_interpolation`	Maximal numbers of NA values in a row to be interpolated. The default is one hour without measurements. Number of NA depneds on the temporal resolution (60 / res)

Value

List with data frames per site, that is ready for MiSa Assessmen. Additional information is printed about the number of interpolated NA values. If there are many NA values that are not interpolated it is probably due to the fact of no measurements during winter.

Relative Negative Deviation from a reference

Description

The cumulative sum of all negative deviations.

Usage

negative_deviation(data_vector, reference_vector)
negative_deviation(data_vector, reference_vector)

Arguments

`data_vector`	Numeric data vector
`reference_vector`	Corresponding data of the reference

Details

First the similarity the data vector and the reference vector is calculated. Only complete pairs (no NA values) are used. For each data pair the quotient between data and reference is calcutaled. If data > reference the value is set to 1. All quotients are cumulated (-> absolute similarity). This can be maximum the number of data pairs. When deviding by the number of data pairs, the relative similarity is obtained. One minus the relative similarity is the negative deviation.

Value

Numeric value between 0 and 1

Read MiSa Files

Description

This function combines the functions read_misa_oneSite and read_misa_multipleSites and is strictly bound the the misa folder structure

Usage

read_misa_files(input_path)
read_misa_files(input_path)

Arguments

input_path

This is the directory where the two folders "files_per_site" and "sites_per_file" are located

Details

All csv files from both folders will be loaded. They must contain a timestamp column and an oxygen column. The timestamp column is identified automatically, by looking for a column where the entries contain ":" and one of the date separating symbols ".", "/" or "-".. The oxygen column is found by its colname. It should conatin "O2", "o2", "Oxygen", "oxygen", "ox", "Ox", "Sauerstoff" or "sauerstoff".

For "file_per_site" files: all letters in the filename before the first "_" are used for the sitename. For "sites_per_file" files: all column names will be used for site names. Thus, all columns except the timestamp must be oxygan concentrations at different sites

Value

A Data frame with 3 columen: Timestamp, Oxygen data and Site name

read_misa_multipleSites

Description

This function reads csv tables with one timestamp column and several oxygen data columns. Where the colnames refer to the sites of measurements

Usage

read_misa_multipleSites(path, file)
read_misa_multipleSites(path, file)

Arguments

`path`	The path to the file
`file`	Filename (including ".csv" Ending)

Value

A Data frame with 3 columen: Timestamp, Oxygen data and Site name

read_misa_oneSite

Description

This function reads csv tables with one timestamp column and one oxygen data column

Usage

read_misa_oneSite(path, file, siteID)
read_misa_oneSite(path, file, siteID)

Arguments

`path`	The path to the file
`file`	Filename (including ".csv" Ending)
`siteID`	a character vector specifying the site name

Value

A Data frame with 3 columen: Timestamp, Oxygen data and Site name

Repeating values in a row within a Vector

Description

Describes the Values of a vector the times they are repeated and the start and end position of those values

Usage

same_inarow(v)
same_inarow(v)

Arguments

`v`	A character, factor or numeric vector

Value

A data frame with four columns: Value (-> listed value of the input vector), Repeats (times it is repeated in a row), starts_at (start position), ends_at (end position).

Breaking down POSIX into months and year and filter by month

Description

Adds month and year column to data frame and filters by month. The addition of the year column is important for the following MiSa Assessment

Usage

SummerMonths(df, time_column = "t", months = 5:9)
SummerMonths(df, time_column = "t", months = 5:9)

Arguments

`df`	data frame with a POSIX time column
`time_column`	Name or number of the time column
`months`	the number of the months that should be kept. The default is 5:9 which are the important months for oxygen deficits in Berlin caused by CSOs

Value

The filteterd input data frame with months and year column

MiSa Assessment: Yearly Numbers of deficits

Description

Counting the events below threshold values on a yearly basis

Usage

yearly_crit_Events(
  dataFrame,
  res = 15,
  seperating_hours = 5 * 24,
  deficiency_hours = 0.25,
  thresholds = 1.5,
  max_missing = 25,
  use_recovery_value = FALSE,
  recovery_value = NULL
)
yearly_crit_Events(
  dataFrame,
  res = 15,
  seperating_hours = 5 * 24,
  deficiency_hours = 0.25,
  thresholds = 1.5,
  max_missing = 25,
  use_recovery_value = FALSE,
  recovery_value = NULL
)

Arguments

`dataFrame`	MiSa Dataframe: with columne "d": oxygen data, "year": year
`res`	Temporal resolution of oxygen data in minutes
`seperating_hours`	TODO: describe (also: should be "separating_hours" with "a", not "e")
`deficiency_hours`	TODO: describe
`thresholds`	Oxygen threshold values used for the assessment in mg/L
`max_missing`	The maximal allowed percent of missing oxygen data. If NA Values exceed this number, hours below thresholds are set to NA
`use_recovery_value`	TODO: describe
`recovery_value`	TODO: describe

Value

Data frane with rows per year and columns per threshold as well es for missing data

MiSa Assessment: Yearly hours of deficits

Description

Counting the hours on a yearly basis below threshold values

Usage

yearly_deficiency_time(
  dataFrame,
  res = 15,
  thresholds = c(0.5, 1, 1.5, 2, 5),
  max_missing = 25
)
yearly_deficiency_time(
  dataFrame,
  res = 15,
  thresholds = c(0.5, 1, 1.5, 2, 5),
  max_missing = 25
)

Arguments

`dataFrame`	MiSa Dataframe: with columne "d": data, "year": year
`res`	Temporal resolution of oxygen data in minutes
`thresholds`	Oxygen threshold values used for the assessment in mg/L
`max_missing`	The maximal allowed percent of missing oxygen data. If NA Values exceed this number, hours below thresholds are set to NA

Value

Data frane with rows per year and columns per threshold as well es for missing data

Negative deviation from a reference site

Description

Functions cumulates the negative deviation (lower O2-Concentrations) compared to a reference site without (significant) urban pollution

Usage

yearly_negative_deviation(dataFrame, oxygen_ref, max_missing = 25)
yearly_negative_deviation(dataFrame, oxygen_ref, max_missing = 25)

Arguments

`dataFrame`	MiSa Dataframe: with columne "d": oxygen data, "year": year
`oxygen_ref`	The course of oxygen of the reference
`max_missing`	The maximal allowed percent of missing oxygen data. If NA Values exceed this number, hours below thresholds are set to NA

Value

Data frane with negative deviation per year

Package 'kwb.misa'

Help Index

Equal time intervals

Description

Usage

Arguments

Value

Average data per predefined timesteps

Description

Usage

Arguments

Value

Force data into predefined time intervals

Description

Usage

Arguments

Details

Value

Count Events of deficits

Description

Usage

Arguments

Value

Examples

Count hours of deficits

Description

Usage

Arguments

Value

finding_o2Column

Description

Usage

Arguments

Value

finding_timestampColumns

Description

Usage

Arguments

Value

Linear interpolation for one or more missing values

Description

Usage

Arguments

Value

Filter MiSa Dataframe

Description

Usage

Arguments

Details

Value

Prepare MiSa Data for MiSa Assessment

Description

Usage

Arguments

Value

Relative Negative Deviation from a reference

Description

Usage

Arguments

Details

Value

Read MiSa Files

Description

Usage

Arguments

Details

Value

read_misa_multipleSites

Description

Usage

Arguments

Value

read_misa_oneSite

Description

Usage

Arguments

Value

Repeating values in a row within a Vector

Description

Usage