Title: | A Package for Condition Predictions for Drinking Water Wells |
---|---|
Description: | This package allows to predict the condition of a drinking water well based on ML models. The models are trained with results from pump tests and a large set of input variables e.g. the well material, the age and the number of regenerations. |
Authors: | Mathias Riechel [aut] |
Maintainer: | Michael Rustler <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2025-02-26 04:17:12 UTC |
Source: | https://github.com/KWB-R/dwc.wells |
Title
chi2.CramersV.test(data)
chi2.CramersV.test(data)
data |
data frame on which to perform Chi-2-test |
Transfer Qs_rel into binary factor with low and high specific capacity
classify_Qs(x, split_point = 80, class_names = c("low", "high"))
classify_Qs(x, split_point = 80, class_names = c("low", "high"))
x |
vector of Qs_rel values |
split_point |
threshold for classifying numeric Qs_rel values, default: 80 |
class_names |
class names, default: c("low", "high") |
Combined Pumptest and Q Monitoring Dataset
combine_pump_test_and_Q_monitoring_data( df_pump_tests_tidy, df_Q_monitoring, pump_test_vars )
combine_pump_test_and_Q_monitoring_data( df_pump_tests_tidy, df_Q_monitoring, pump_test_vars )
df_pump_tests_tidy |
df_pump_tests_tidy |
df_Q_monitoring |
df_Q_monitoring |
pump_test_vars |
default: |
combined pumptest and Q monitoring dataset
plots Qs_rel vs. input variable as box plot (categorical input variable) or scatterplot (numerical input variable)
correlation_plot(df, x, y = "Qs_rel", title = gsub("_", " ", x))
correlation_plot(df, x, y = "Qs_rel", title = gsub("_", " ", x))
df |
data frame |
x |
column name of x variable" |
y |
column name of y variable (default Qs_rel") |
title |
plot title |
Get Path to File in This Package
extdata_file(...)
extdata_file(...)
... |
parts of path passed to |
Fill up NA values with median of lookup table
fill_up_na_with_median_from_lookup(df, df_lookup, matching_id = "well_id")
fill_up_na_with_median_from_lookup(df, df_lookup, matching_id = "well_id")
df |
data frame with NA values |
df_lookup |
data frame to calculate median values |
matching_id |
column with ids for which median should be calculated |
calculate absolute and relative frequencies of categorical varables
frequency_table(x, perc_digits = 1, sort_freq = FALSE)
frequency_table(x, perc_digits = 1, sort_freq = FALSE)
x |
vector with categorical variable |
perc_digits |
number of decimal digits for percentages, default = 1 |
sort_freq |
sort according to frequency counts, logical, default: TRUE |
Get Default Pump Test Variables
get_pump_test_vars()
get_pump_test_vars()
vector with column names of pump test variables
get_pump_test_vars()
get_pump_test_vars()
Get W_static measurement data from Neubaupumpversuche, Kurzpumpversuche and other sources
get_W_static_data(path, renamings, df_wells)
get_W_static_data(path, renamings, df_wells)
path |
path to static water level data (csv-file) |
renamings |
list with renamings |
df_wells |
data frame with prepared well data |
Interpolate and fill up static water level
interpolate_and_fill(df, x_col, y_col, group_by_col, origin_col)
interpolate_and_fill(df, x_col, y_col, group_by_col, origin_col)
df |
data frame |
x_col |
x column, e.g. date, to be used for interpolation |
y_col |
y column, e.g. measured values, to be used for interpolation |
group_by_col |
grouping variable within which interpolation is done |
origin_col |
already existing or to be created column with type of value |
Interpolates Qs time series data to a given time interval
interpolate_Qs(df, interval_days = 1)
interpolate_Qs(df, interval_days = 1)
df |
data frame with date and Qs measurements |
interval_days |
interval for interpolation |
load renaming table from original excel file
load_renamings_csv(infile)
load_renamings_csv(infile)
infile |
full path to excel file |
load renaming table from original excel file
load_renamings_excel( infile, sheet = "DATEN", old_name_col = "Feld", new_name_col = "Parametername-R" )
load_renamings_excel( infile, sheet = "DATEN", old_name_col = "Feld", new_name_col = "Parametername-R" )
infile |
full path to excel file |
sheet |
sheet name |
old_name_col |
name of column with original variable names |
new_name_col |
name of column with new variable names |
A reduced dataset for well capacity prediction created with R script in /data-raw/model_data.R
model_data_reduced
model_data_reduced
A data.frame with 6308 rows and 27 variables:
well id, for info
date of capacity measurement, for info
measurement key, e.g. operational_start, pump_test_1, pump_test_2, for info
specific capacity of well relative to operational start condition, output
days since operational start, redundant
years since operationa start, input, numeric
year of well construction
screen material
well diameter (mm)
drilling_method
allowed pumping rate
initial Qs at construction
confined / unconfined
standard deviation of static water level
distance to surface water
number of well rehabilitations
time since last well rehabilitation in years
mean daily abstraction volume (m3)
water quality: electical conductivity (us/cm)
water quality: dissolved oxygen (mg/l)
water quality: temperature (C)
water quality: pH
water quality: electical conductivity (us/cm)
water quality: dissolved oxygen (mg/l)
water quality: Mn (mg/l)
water quality: NO3 (mg/l)
water quality: PO4 (mg/l)
water quality: SO4 (mg/l)
water quality: Total Suspended Solids (mg/l)
Paste percent sign to numbers
paste_percent(x)
paste_percent(x)
x |
numeric vector |
plot frequency distribution of numerical variable
plot_distribution( Data, variable, binwidth = NULL, title, vertical_x_axis_labels = TRUE, boundary = 0 )
plot_distribution( Data, variable, binwidth = NULL, title, vertical_x_axis_labels = TRUE, boundary = 0 )
Data |
Data to be plotted |
variable |
variable |
binwidth |
binwidrh |
title |
plot title |
vertical_x_axis_labels |
should x-axis labels be ploted vertically (TRUE / FALSE) |
boundary |
left boundary of bars, default: 0 |
plot frequency distribution of factor variable
plot_frequencies( Data, variable, title = variable, offset_perc_labels = 0.1, size_perc_labels = 3, vertical_x_axis_labels = TRUE )
plot_frequencies( Data, variable, title = variable, offset_perc_labels = 0.1, size_perc_labels = 3, vertical_x_axis_labels = TRUE )
Data |
Data to be plotted |
variable |
variable |
title |
plot title |
offset_perc_labels |
distance of labels from bars |
size_perc_labels |
size of percent labels |
vertical_x_axis_labels |
should x-axis labels be ploted vertically (TRUE / FALSE) |
prepare pump test data with one row per Qs-measurement + rehab history
prepare_pump_test_data(path, renamings, df_wells, pump_test_vars)
prepare_pump_test_data(path, renamings, df_wells, pump_test_vars)
path |
path to pump test data |
renamings |
list with renamings |
df_wells |
prepared data frame with well characteristics |
pump_test_vars |
default: |
Steps: i) read, rename and clean data, ii) correct wrong pump test dates, iii) fill up missing pump test dates, iv) get information for replaced wells, v) calculate Qs and Qs_rel, vi) determine action type, vii) select columns
prepare_pump_test_data_1(path, renamings, df_wells)
prepare_pump_test_data_1(path, renamings, df_wells)
path |
path to pump test data |
renamings |
list with renamings |
df_wells |
prepared data frame with well characteristics |
reformats untidy pump test data from wide into long format
prepare_pump_test_data_2( df_pump_tests_untidy, df_wells, pump_test_vars = get_pump_test_vars() )
prepare_pump_test_data_2( df_pump_tests_untidy, df_wells, pump_test_vars = get_pump_test_vars() )
df_pump_tests_untidy |
pump test data in wide format |
df_wells |
prepared data frame with well characteristics |
pump_test_vars |
default: |
Prepare Quality Data
prepare_quality_data(path, renamings)
prepare_quality_data(path, renamings)
path |
path |
renamings |
renamings |
prepared quality day
Prepare Volume Data
prepare_volume_data(path, renamings, df_wells)
prepare_volume_data(path, renamings, df_wells)
path |
path |
renamings |
renamings |
df_wells |
df_wells |
Prepared volume data
Heatmap / raster plot for Qs values over time with each well as one line
Qs_heatmap_plot( df, colours, dummy_labels, date_limits, title, n_wells_per_page )
Qs_heatmap_plot( df, colours, dummy_labels, date_limits, title, n_wells_per_page )
df |
data frame with date, well_id, Qs_rel |
colours |
3 colours for low, middle and high colour limits |
dummy_labels |
dummy labels if there are less wells than expected |
date_limits |
vector with two date strings in format "yyyy-mm-dd" |
title |
plot title |
n_wells_per_page |
number of wells do be shown |
read csv data file exported by Sebastian Schimmelpfennig from db2
read_csv( file, header = TRUE, fileEncoding = "UTF-8", skip = 2, dec = ".", sep = "\t", na.strings = "(null)" )
read_csv( file, header = TRUE, fileEncoding = "UTF-8", skip = 2, dec = ".", sep = "\t", na.strings = "(null)" )
file |
path to csv file |
header |
logical, default = TRUE |
fileEncoding |
default = UTF-8 |
skip |
number of rows to skip, default = 2 |
dec |
decimal separator, default = '.' |
sep |
columns separator, default = 'tab' |
na.strings |
string that represents NA, default = "(null)" |
read table from MS Access data base via odbc connection under 64-bit-R
read_ms_access(path_db, tbl_name)
read_ms_access(path_db, tbl_name)
path_db |
full path to database |
tbl_name |
name of database table to be read |
read table from MS Access data base; select and rename columns as defined in renamings table ('old_name' -> 'new_name')
read_select_rename( path_db, tbl_name, renamings, old_name_col = "old_name", new_name_col = "new_name" )
read_select_rename( path_db, tbl_name, renamings, old_name_col = "old_name", new_name_col = "new_name" )
path_db |
full path to database |
tbl_name |
name of database table to be read |
renamings |
name of data frame with renamings |
old_name_col |
name of column with original variable names |
new_name_col |
name of column with new variable names |
rename values of a character vector according to renamings table
rename_values( x, renamings, old_name_col = "old_name", new_name_col = "new_name" )
rename_values( x, renamings, old_name_col = "old_name", new_name_col = "new_name" )
x |
character vector |
renamings |
data frame consisting of old and new names |
old_name_col |
name of column with original variable names |
new_name_col |
name of column with new variable names |
Replace NAs with median
replace_na_with_median(x)
replace_na_with_median(x)
x |
vector, for which NA should be replaced |
Save data frame in different formats: csv, RData, rds
save_data(Data, path, filename, formats = c("csv", "RData", "rds"))
save_data(Data, path, filename, formats = c("csv", "RData", "rds"))
Data |
data frame |
path |
out path for saving data |
filename |
core of file name |
formats |
export formats: "csv", "RData", "rds" or several using 'c' |
scatterplot for comparing numeric predictions with observations
scatterplot(df_pred, lines_80perc = FALSE, alpha = 1, pointsize = 1)
scatterplot(df_pred, lines_80perc = FALSE, alpha = 1, pointsize = 1)
df_pred |
data frame obtained with tidymodels::collect_predictions() with columns Qs_rel and .pred |
lines_80perc |
logical value; shout 80%-lines be drawn?; default = FALSE |
alpha |
alpha value for point of colours, default: 1 |
pointsize |
size value for points, default: 1 |
selects and renames columns from a data frame according to a reference table
select_rename_cols( df, renamings, old_name_col = "old_name", new_name_col = "new_name" )
select_rename_cols( df, renamings, old_name_col = "old_name", new_name_col = "new_name" )
df |
data frame with cols to be renamed |
renamings |
name of data frame with renamings |
old_name_col |
name of column with original variable names |
new_name_col |
name of column with new variable names |
summarise factor levels with relative frequency below a threshold
summarise_marginal_factor_levels(x, perc_threshold, marginal_name)
summarise_marginal_factor_levels(x, perc_threshold, marginal_name)
x |
factor variable |
perc_threshold |
percentage threshold under which levels will be summarised |
marginal_name |
for new summary factor level |
turn character into factor, sort factor levels and replace NA level
tidy_factor(x, level_sorting = c("frequency", "alphabet")[1])
tidy_factor(x, level_sorting = c("frequency", "alphabet")[1])
x |
character vector to be turned to factor |
level_sorting |
sorting of factor levels; two options: "frequency" (default) and "alphabet"; level "Unbekannt" is always always at the end |