Title: | Functions to be Used in Project MiSa |
---|---|
Description: | Assessment of oxygen course in rivers. Assessment is aimed at reducing critical situations and fish deaths. |
Authors: | Malte Zamzow [aut, cre], Kompetenzzentrum Wasser Berlin gGmbH (KWB) [cph] |
Maintainer: | Malte Zamzow <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-10-18 03:29:53 UTC |
Source: | https://github.com/KWB-R/kwb.misa |
Transforms the time vector, so that the every timestep is shifted to equal equally distanced points of time
adjust_time(time_vector, time_interval)
adjust_time(time_vector, time_interval)
time_vector |
A POSIXct vector |
time_interval |
Temporal resolution in seconds |
A POSIX vector with equally distanced points of time. The new timesteps start on the hour and
Creates a data frame with fitted time column and corresponding average data values.
aggregate_measurements(time_vector, data_vector, time_interval = 60 * 15)
aggregate_measurements(time_vector, data_vector, time_interval = 60 * 15)
time_vector |
A POSIXct vector |
data_vector |
A numeric vector, with data corresponding to the time_vector |
time_interval |
Temporal resolution in seconds |
Data frame with POSIX time column "t" and data column "d
The measurements are fitted into timesteps defined be the first point of time and a temporal resolution
continuousTimeIntervals( time_vector, data_vector, res = 15, first_pointOfTime = min(time_vector, na.rm = T), last_pointOfTime = max(time_vector, na.rm = T) )
continuousTimeIntervals( time_vector, data_vector, res = 15, first_pointOfTime = min(time_vector, na.rm = T), last_pointOfTime = max(time_vector, na.rm = T) )
time_vector |
A POSIXct vector |
data_vector |
A numeric vector, with data corresponding to the time_vector |
res |
Temporal resolution in minutes |
first_pointOfTime |
Starting point (POSIXct) of the newly defined time series. By default the minimum of the time_vector |
last_pointOfTime |
End point (POSIXct) of the newly defined time series. By default the maximum of the time_vector |
In a first step a vactor is generated with continuous timesteps, starting at first_pointOfTime by a defined time interval. Subsequently, the measured data is forced into timesteps with a similar time interval. Here, the measurements are assigned to the timestep that is closest to the actual time of measurements. If more than one measurement are assigned to one timestep, the average is used. If there is no measurement, NA is used.
Dataframe with POSIX column "t" and data column "d"
Counts the Number of intervals where x number of data points in a row are below a predifined threshold value. Events are separated by a specified number of data points above that threshod value. Furthermore, the exceedance of a value can also seperation critirion.
count_def_events( data_vector, starting_data_points, threshold, separating_data_points, use_recovery_value = FALSE, recovery_value = NULL, return_event_positions = FALSE )
count_def_events( data_vector, starting_data_points, threshold, separating_data_points, use_recovery_value = FALSE, recovery_value = NULL, return_event_positions = FALSE )
data_vector |
Numeric vector (with data in the same unit as the tjreshold) |
starting_data_points |
Minimal number of data points to define the beginning of an deficiency event |
threshold |
Numeric in the same unit as the data vector |
separating_data_points |
Minimal number of data points to seperate two events |
use_recovery_value |
If TRUE a recovery, two events are only separated if a revocvery value is exceeded between two deficits |
recovery_value |
Numeric in the same unit as the data vector. Only used if use_recovery_value = TRUE. |
return_event_positions |
Instead the number of events, the events starting and endpositions are returned, correspoding to the data vector |
Either a number of events or a data frame with event start and end position
data_vector <- sin(x = seq(0,50,0.5)) * 1:101/20 a <- count_def_events( data_vector = data_vector, starting_data_points = 2, threshold = 0, separating_data_points = 4, use_recovery_value = FALSE, recovery_value = 7, return_event_positions = TRUE) plot(data_vector, pch = 20, type = "b") rect(xleft = a$tBeg, xright = a$tEnd, ybottom = -10, ytop = 10, col = "red", density = 4) recovery_value <- 3 a <- count_def_events( data_vector = data_vector, starting_data_points = 2, threshold = 0, separating_data_points = 4, use_recovery_value = TRUE, recovery_value = recovery_value, return_event_positions = TRUE) plot(data_vector, pch = 20, type = "b") rect(xleft = a$tBeg[a$start], xright = a$tEnd[a$end], ybottom = -10, ytop = 10, col = "red", density = 4) abline(h = recovery_value, col = "blue")
data_vector <- sin(x = seq(0,50,0.5)) * 1:101/20 a <- count_def_events( data_vector = data_vector, starting_data_points = 2, threshold = 0, separating_data_points = 4, use_recovery_value = FALSE, recovery_value = 7, return_event_positions = TRUE) plot(data_vector, pch = 20, type = "b") rect(xleft = a$tBeg, xright = a$tEnd, ybottom = -10, ytop = 10, col = "red", density = 4) recovery_value <- 3 a <- count_def_events( data_vector = data_vector, starting_data_points = 2, threshold = 0, separating_data_points = 4, use_recovery_value = TRUE, recovery_value = recovery_value, return_event_positions = TRUE) plot(data_vector, pch = 20, type = "b") rect(xleft = a$tBeg[a$start], xright = a$tEnd[a$end], ybottom = -10, ytop = 10, col = "red", density = 4) abline(h = recovery_value, col = "blue")
Count hours of deficits
count_def_hours(data_vector, threshold, res)
count_def_hours(data_vector, threshold, res)
data_vector |
Numeric vector (with data in the same unit as the threshold) |
threshold |
Numeric in the same unit as the data vector |
res |
Temporal resolution of data in minutes |
A single Value (hours of deficits)
This function looks for the oxygen data column in a data frame by column name
finding_o2Column(dataFrame, tryO2 = c("o2", "oxygen", "ox", "sauerstoff"))
finding_o2Column(dataFrame, tryO2 = c("o2", "oxygen", "ox", "sauerstoff"))
dataFrame |
The data frame where the column is searched |
tryO2 |
A vector with patterns possible patterns of columnnames with oxygen data (not case sensitive) |
A vector with column numbers of oxygen columns
This function looks for the timestamp column in a data frame by typical timestamp symbols
finding_timestampColumns(dataFrame)
finding_timestampColumns(dataFrame)
dataFrame |
The data frame where the column is searched |
A vector with column numbers of timestamp columns
All sections of NA values that are smaller or equal as a defined maximal number of NA's are interpolated
interpolate_multipleNA(data_vector, max_na)
interpolate_multipleNA(data_vector, max_na)
data_vector |
Numeric vector of measurements (including NA values) |
max_na |
the maximal number of NA values in a row to be interpolated |
A list containing the data vector with interpolated NA value as well as an information about the amount of NA's interpolated
Filters the loaded data frame by sites and time
misa_filter_data( dataFrame, sites = "", tBeg = min(dataFrame$posixDateTime, na.rm = TRUE), tEnd = max(dataFrame$posixDateTime, na.rm = TRUE) )
misa_filter_data( dataFrame, sites = "", tBeg = min(dataFrame$posixDateTime, na.rm = TRUE), tEnd = max(dataFrame$posixDateTime, na.rm = TRUE) )
dataFrame |
Data frame loaded by a MiSa function (see details) |
sites |
Names of considered sites, written in the site column of a MiSa Dataframe |
tBeg |
POSIX-Value with a start time of the observeration interval |
tEnd |
POSIX-Value with an end time of the observeration interval |
The name of the site column must be "site", the name of the timestamp column
must be "posixDateTime".
The best way is to load the oxygen data with one of the following
functions: read_misa_oneSite()
,read_misa_multipleSites()
or
read_misa_files()
.
A filtered data frame with the same columns as the input data frame
Timestamps are adapted, oxygen data is interpolated, it is filtered for summmer months
misa_prepare_data(df_MiSa, res = 15, max_na_interpolation = 60/res)
misa_prepare_data(df_MiSa, res = 15, max_na_interpolation = 60/res)
df_MiSa |
Data frame loaded with one of the MiSa Load functions |
res |
Temporal resolution in minutes |
max_na_interpolation |
Maximal numbers of NA values in a row to be interpolated. The default is one hour without measurements. Number of NA depneds on the temporal resolution (60 / res) |
List with data frames per site, that is ready for MiSa Assessmen. Additional information is printed about the number of interpolated NA values. If there are many NA values that are not interpolated it is probably due to the fact of no measurements during winter.
The cumulative sum of all negative deviations.
negative_deviation(data_vector, reference_vector)
negative_deviation(data_vector, reference_vector)
data_vector |
Numeric data vector |
reference_vector |
Corresponding data of the reference |
First the similarity the data vector and the reference vector is calculated. Only complete pairs (no NA values) are used. For each data pair the quotient between data and reference is calcutaled. If data > reference the value is set to 1. All quotients are cumulated (-> absolute similarity). This can be maximum the number of data pairs. When deviding by the number of data pairs, the relative similarity is obtained. One minus the relative similarity is the negative deviation.
Numeric value between 0 and 1
This function combines the functions read_misa_oneSite and read_misa_multipleSites and is strictly bound the the misa folder structure
read_misa_files(input_path)
read_misa_files(input_path)
input_path |
This is the directory where the two folders "files_per_site" and "sites_per_file" are located |
All csv files from both folders will be loaded. They must contain a timestamp column and an oxygen column. The timestamp column is identified automatically, by looking for a column where the entries contain ":" and one of the date separating symbols ".", "/" or "-".. The oxygen column is found by its colname. It should conatin "O2", "o2", "Oxygen", "oxygen", "ox", "Ox", "Sauerstoff" or "sauerstoff".
For "file_per_site" files: all letters in the filename before the first "_" are used for the sitename. For "sites_per_file" files: all column names will be used for site names. Thus, all columns except the timestamp must be oxygan concentrations at different sites
A Data frame with 3 columen: Timestamp, Oxygen data and Site name
This function reads csv tables with one timestamp column and several oxygen data columns. Where the colnames refer to the sites of measurements
read_misa_multipleSites(path, file)
read_misa_multipleSites(path, file)
path |
The path to the file |
file |
Filename (including ".csv" Ending) |
A Data frame with 3 columen: Timestamp, Oxygen data and Site name
This function reads csv tables with one timestamp column and one oxygen data column
read_misa_oneSite(path, file, siteID)
read_misa_oneSite(path, file, siteID)
path |
The path to the file |
file |
Filename (including ".csv" Ending) |
siteID |
a character vector specifying the site name |
A Data frame with 3 columen: Timestamp, Oxygen data and Site name
Describes the Values of a vector the times they are repeated and the start and end position of those values
same_inarow(v)
same_inarow(v)
v |
A character, factor or numeric vector |
A data frame with four columns: Value (-> listed value of the input vector), Repeats (times it is repeated in a row), starts_at (start position), ends_at (end position).
Adds month and year column to data frame and filters by month. The addition of the year column is important for the following MiSa Assessment
SummerMonths(df, time_column = "t", months = 5:9)
SummerMonths(df, time_column = "t", months = 5:9)
df |
data frame with a POSIX time column |
time_column |
Name or number of the time column |
months |
the number of the months that should be kept. The default is 5:9 which are the important months for oxygen deficits in Berlin caused by CSOs |
The filteterd input data frame with months and year column
Counting the events below threshold values on a yearly basis
yearly_crit_Events( dataFrame, res = 15, seperating_hours = 5 * 24, deficiency_hours = 0.25, thresholds = 1.5, max_missing = 25, use_recovery_value = FALSE, recovery_value = NULL )
yearly_crit_Events( dataFrame, res = 15, seperating_hours = 5 * 24, deficiency_hours = 0.25, thresholds = 1.5, max_missing = 25, use_recovery_value = FALSE, recovery_value = NULL )
dataFrame |
MiSa Dataframe: with columne "d": oxygen data, "year": year |
res |
Temporal resolution of oxygen data in minutes |
seperating_hours |
TODO: describe (also: should be "separating_hours" with "a", not "e") |
deficiency_hours |
TODO: describe |
thresholds |
Oxygen threshold values used for the assessment in mg/L |
max_missing |
The maximal allowed percent of missing oxygen data. If NA Values exceed this number, hours below thresholds are set to NA |
use_recovery_value |
TODO: describe |
recovery_value |
TODO: describe |
Data frane with rows per year and columns per threshold as well es for missing data
Counting the hours on a yearly basis below threshold values
yearly_deficiency_time( dataFrame, res = 15, thresholds = c(0.5, 1, 1.5, 2, 5), max_missing = 25 )
yearly_deficiency_time( dataFrame, res = 15, thresholds = c(0.5, 1, 1.5, 2, 5), max_missing = 25 )
dataFrame |
MiSa Dataframe: with columne "d": data, "year": year |
res |
Temporal resolution of oxygen data in minutes |
thresholds |
Oxygen threshold values used for the assessment in mg/L |
max_missing |
The maximal allowed percent of missing oxygen data. If NA Values exceed this number, hours below thresholds are set to NA |
Data frane with rows per year and columns per threshold as well es for missing data
Functions cumulates the negative deviation (lower O2-Concentrations) compared to a reference site without (significant) urban pollution
yearly_negative_deviation(dataFrame, oxygen_ref, max_missing = 25)
yearly_negative_deviation(dataFrame, oxygen_ref, max_missing = 25)
dataFrame |
MiSa Dataframe: with columne "d": oxygen data, "year": year |
oxygen_ref |
The course of oxygen of the reference |
max_missing |
The maximal allowed percent of missing oxygen data. If NA Values exceed this number, hours below thresholds are set to NA |
Data frane with negative deviation per year