Package 'kwb.file' reference manual

Title:	Functions Related to File and Path Operations
Description:	This package provides helper functions that have been developed during different research projects at KWB. The functions are dealing with file operations and handling file and folder paths. Let's see what we have in different scripts and other packages and better fits here...
Authors:	Hauke Sonnenberg [aut, cre] , Kompetenzzentrum Wasser Berlin gGmbH (KWB) [cph]
Maintainer:	Hauke Sonnenberg <[email protected]>
License:	MIT + file LICENSE
Version:	0.3.2
Built:	2025-02-06 03:56:24 UTC
Source:	https://github.com/KWB-R/kwb.file

Add File Information From File Database

Description

Add File Information From File Database

Usage

add_file_info(data)
add_file_info(data)

Arguments

data

data frame with column file_id containing file identifiers and with an attribute file_db containing a "file database" as created by to_file_database

Value

data frame data with additional columns folder_path and file_name

Examples

# Define some paths
paths <- c(
  "/very/long/path/very_long_file_name_1",
  "/very/long/path/very_long_file_name_2",
  "/very/long/path/very_long_file_name_3"
)

# Create a "file database" from the paths
file_db <- kwb.file::to_file_database(paths, remove_common_base = FALSE)

# Create a data frame that relates some information to the files.
# Use the file identifier instead of the full name to keep the data clean
(df <- kwb.utils::noFactorDataFrame(
  file_id = file_db$files$file_id, 
  value = seq_along(paths)
))

# Store the file database in the attribute "file_db"
df <- structure(df, file_db = file_db)

# Restore the full file paths
add_file_info(df)

# Define some paths
paths <- c(
  "/very/long/path/very_long_file_name_1",
  "/very/long/path/very_long_file_name_2",
  "/very/long/path/very_long_file_name_3"
)

# Create a "file database" from the paths
file_db <- kwb.file::to_file_database(paths, remove_common_base = FALSE)

# Create a data frame that relates some information to the files.
# Use the file identifier instead of the full name to keep the data clean
(df <- kwb.utils::noFactorDataFrame(
  file_id = file_db$files$file_id, 
  value = seq_along(paths)
))

# Store the file database in the attribute "file_db"
df <- structure(df, file_db = file_db)

# Restore the full file paths
add_file_info(df)

Copy Files to Flat Structure

Description

Calls file.copy under the hood but gives a message about the indices and paths of the files that could not be copied.

Usage

copy_files_to_target_dir(from_paths, target_dir, target_files)
copy_files_to_target_dir(from_paths, target_dir, target_files)

Arguments

`from_paths`	paths to the files to be copied
`target_dir`	path to the target directory
`target_files`	relative paths to the target files, relative to `target_dir`

Examples

root <- system.file(package = "kwb.file")

relative_paths <- dir(root, recursive = TRUE)

# The original files are in root or in different subfolders
relative_paths

# Create a temporary target folder
target_dir <- kwb.utils::createDirectory(file.path(tempdir(), "target"))

# Copy all files into one target folder without subfolders
from_paths <- file.path(root, relative_paths)
to_paths <- basename(from_paths)

# Make sure that the target file names contain no duplicates, otherwise
# an error is raised
to_paths <- kwb.utils::makeUnique(to_paths, warn = FALSE)

# Copy the files
copy_files_to_target_dir(from_paths, target_dir, to_paths)

# Look at the result
dir(target_dir, recursive = TRUE)

root <- system.file(package = "kwb.file")

relative_paths <- dir(root, recursive = TRUE)

# The original files are in root or in different subfolders
relative_paths

# Create a temporary target folder
target_dir <- kwb.utils::createDirectory(file.path(tempdir(), "target"))

# Copy all files into one target folder without subfolders
from_paths <- file.path(root, relative_paths)
to_paths <- basename(from_paths)

# Make sure that the target file names contain no duplicates, otherwise
# an error is raised
to_paths <- kwb.utils::makeUnique(to_paths, warn = FALSE)

# Copy the files
copy_files_to_target_dir(from_paths, target_dir, to_paths)

# Look at the result
dir(target_dir, recursive = TRUE)

Helper function to return full paths

Description

This function provides a shortcut to dir(..., full.names = TRUE)

Usage

dir_full(...)
dir_full(...)

Arguments

...

arguments passed to dir

Examples

dir_full(system.file(package = "kwb.file"))

dir_full(system.file(package = "kwb.file"))

Get Full Paths to all XML files Below a Root Folder

Description

Get Full Paths to all XML files Below a Root Folder

Usage

dir_full_recursive_xml(root)
dir_full_recursive_xml(root)

Arguments

root

path to root folder

Value

vector of character

Get Default Download Directory

Description

Get Default Download Directory

Usage

get_download_dir()
get_download_dir()

Value

assumed default download directory on the user's computer (vector of character of length one)

Examples

dir_full(get_download_dir())

dir_full(get_download_dir())

Read File Metadata from YAML-File

Description

Read File Metadata from YAML-File

Usage

read_file_metadata(
  yaml_file,
  file_encoding = "UTF-8",
  out_class = c("data.frame", "list")[1]
)
read_file_metadata(
  yaml_file,
  file_encoding = "UTF-8",
  out_class = c("data.frame", "list")[1]
)

Arguments

`yaml_file`	path to YAML-File containing file metadata (as saved with `kwb.file:::write_file_info_to_yaml_file`)
`file_encoding`	passed to argument `fileEncoding` of `read_yaml`
`out_class`	one of "data.frame", "list"

Value

depending on out_class, either a data frame with the following columns or a list with the following elements is returned:

file_id: clean file name given to the original file for simpler access,
original_name: original file name given by data provider,
original_folder: original path to folder in which file was provided.

Remove the Common Root Parts

Description

Remove the Common Root Parts

Usage

remove_common_root(x, n_keep = 1L, dbg = TRUE)
remove_common_root(x, n_keep = 1L, dbg = TRUE)

Arguments

`x`	list of vectors of character as returned by `strsplit` or a vector of character.
`n_keep`	minimum number of segments to be kept in any case in the returned relative paths. For example, two paths "a" and "a/b" have the common root "a". Removing this root would result in relative paths "" and "b". As this is not useful, `n_keep` is `1` by default, making sure that all paths keep at least one segment (segment "a") in the example.
`dbg`	if `TRUE` debug messages are shown

Examples

# Split paths at the slashes
absparts <- strsplit(c("a/b/c", "a/b/d", "a/b/e/f/g", "a/b/hi"), "/")

# Remove the common parts of the paths
relparts <- remove_common_root(absparts)
relparts

# The extracted root is returned in attribute "root"
attr(relparts, "root")

# Split paths at the slashes
absparts <- strsplit(c("a/b/c", "a/b/d", "a/b/e/f/g", "a/b/hi"), "/")

# Remove the common parts of the paths
relparts <- remove_common_root(absparts)
relparts

# The extracted root is returned in attribute "root"
attr(relparts, "root")

"Split Full Paths into Directory Path and Filename"

Description

"Split Full Paths into Directory Path and Filename"

Usage

split_into_dir_and_file(paths)
split_into_dir_and_file(paths)

Arguments

paths

vector of character representing full file paths

Value

data frame with columns directory and file

Examples

split_into_dir_and_file(c("path/to/file-1", "path/to/file-2"))
split_into_dir_and_file(c("path/to/file-1", "path/to/file-2"))

Split Full Paths into Root, Folder, File and Extension

Description

Split Full Paths into Root, Folder, File and Extension

Usage

split_into_root_folder_file_extension(paths, n_root_parts = 0)
split_into_root_folder_file_extension(paths, n_root_parts = 0)

Arguments

`paths`	vector of character representing full file paths
`n_root_parts`	number of first path segments considered as "root"

Value

data frame with columns root, folder, file, extension, depth

Examples

paths <- c(
  "//always/the/same/root/project-1/intro.doc",
  "//always/the/same/root/project-1/logo.png",
  "//always/the/same/root/project-2/intro.txt",
  "//always/the/same/root/project-2/planning/file-1.doc",
  "//always/the/same/root/project-2/result/report.pdf"
)

split_into_root_folder_file_extension(paths)
split_into_root_folder_file_extension(paths, n_root_parts = 6)
split_into_root_folder_file_extension(paths, n_root_parts = 7)

paths <- c(
  "//always/the/same/root/project-1/intro.doc",
  "//always/the/same/root/project-1/logo.png",
  "//always/the/same/root/project-2/intro.txt",
  "//always/the/same/root/project-2/planning/file-1.doc",
  "//always/the/same/root/project-2/result/report.pdf"
)

split_into_root_folder_file_extension(paths)
split_into_root_folder_file_extension(paths, n_root_parts = 6)
split_into_root_folder_file_extension(paths, n_root_parts = 7)

Split Full Paths at Slashes into Parts

Description

Split Full Paths at Slashes into Parts

Usage

split_paths(paths, dbg = TRUE, use_fs = FALSE)
split_paths(paths, dbg = TRUE, use_fs = FALSE)

Arguments

`paths`	vector of character representing full file paths
`dbg`	if `TRUE` (default), a debug message is shown
`use_fs`	whether or not to simply use `path_split`. Defaults to `FALSE`

Examples

segments <- split_paths(c("path/to/file-1", "path/to/file-2"))
segments
segments <- split_paths(c("path/to/file-1", "path/to/file-2"))
segments

Create Two Table Relational Database From Paths

Description

From a vector of given file paths, this function generates short and unique identifiers for files and folders. The assignements between identifiers and original paths are stored in two data frames, files and folders that are returned.

Usage

to_file_database(files, remove_common_base = TRUE)
to_file_database(files, remove_common_base = TRUE)

Arguments

`files`	vector of file paths
`remove_common_base`	if `TRUE` (default) the common root of all `files` is removed before creating the database

Value

list of two data frames, files and folders

Examples

paths <- c(
  "very_long/very_ugly_path/even with spaces.doc",
  "very_long/very_ugly_path/even with spaces.docx"
)

to_file_database(paths)
to_file_database(paths, remove_common_base = FALSE)

paths <- c(
  "very_long/very_ugly_path/even with spaces.doc",
  "very_long/very_ugly_path/even with spaces.docx"
)

to_file_database(paths)
to_file_database(paths, remove_common_base = FALSE)

Convert Long File Paths to Simple Paths

Description

Convert Long File Paths to Simple Paths

Usage

to_simple_names(paths, method = 1L, get_base = NULL, sha1_digits = 4)
to_simple_names(paths, method = 1L, get_base = NULL, sha1_digits = 4)

Arguments

`paths`	vector of character containing file paths
`method`	`method = 1`: file names generated match the pattern `file_<xx>` with `<xx>` being an integer number of two digits. `method = 2`: file names generated match the pattern `file_<sha>` with `<sha>` being the first `sha1_digits` digits of the sha1 hash (see e.g. http://www.sha1-online.com/) of the base names of the `paths`. By default, the base name is the file name (without folder path) without extension. The base names can be determined individually by providing a function in `get_base`
`get_base`	function taking a vector of character as input and returning a vector of character as output. If not `NULL`, this function will be used to determine the base paths from the `paths` when `method = 2` was specified.
`sha1_digits`	number of digits used when `method = 2` is to be applied

Value

vector of character as long as paths

Examples

paths <- c("v1_ugly_name_1.doc",  "v1_very_ugly_name.xml",
           "v2_ugly_name_1.docx", "v2_very_ugly_name.xmlx")
           
to_simple_names(paths, method = 1L)
writeLines(sort(to_simple_names(paths, method = 2L)))

# All sha1 are different because all base names (file name without extension
# by default) are different. If you want to give the same sha1 to files that 
# correspond to each other but have a different extension, set the function 
# that extracts the "base name" of the file:

get_base <- function(x) kwb.utils::removeExtension(gsub("^v\\d+_", "", x))

writeLines(sort(to_simple_names(paths, method = 2L, get_base = get_base)))

# Now the file names that have the same base name (neglecting the prefix 
# v1_ or v2_) get the same sha1 and thus appear as groups in the sorted 
# file list

paths <- c("v1_ugly_name_1.doc",  "v1_very_ugly_name.xml",
           "v2_ugly_name_1.docx", "v2_very_ugly_name.xmlx")
           
to_simple_names(paths, method = 1L)
writeLines(sort(to_simple_names(paths, method = 2L)))

# All sha1 are different because all base names (file name without extension
# by default) are different. If you want to give the same sha1 to files that 
# correspond to each other but have a different extension, set the function 
# that extracts the "base name" of the file:

get_base <- function(x) kwb.utils::removeExtension(gsub("^v\\d+_", "", x))

writeLines(sort(to_simple_names(paths, method = 2L, get_base = get_base)))

# Now the file names that have the same base name (neglecting the prefix 
# v1_ or v2_) get the same sha1 and thus appear as groups in the sorted 
# file list

Convert a Vector of Paths to a Matrix of Subfolders

Description

Convert a Vector of Paths to a Matrix of Subfolders

Usage

to_subdir_matrix(
  paths,
  fill.value = "",
  result_type = "matrix",
  dbg = FALSE,
  method = NA_integer_
)
to_subdir_matrix(
  paths,
  fill.value = "",
  result_type = "matrix",
  dbg = FALSE,
  method = NA_integer_
)

Arguments

`paths`	vector of path strings
`fill.value`	value used to fill empty cells of the result matrix
`result_type`	one of `c("matrix", "data.frame", "list")`, specifying the type of object to be returned. Result type "list" is only implemented for `method = 2`.
`dbg`	if `TRUE` debug messages are shown
`method`	integer specifying the implementation method. Currently not used.

Value

matrix or data frame, depending on result_type

Examples

folder_matrix <- kwb.file::to_subdir_matrix(c("a1/b1/c1", "a1/b2", "a2"))

folder_matrix

dim(folder_matrix)

folder_matrix[folder_matrix[, 1] == "a1", ]

folder_matrix <- kwb.file::to_subdir_matrix(c("a1/b1/c1", "a1/b2", "a2"))

folder_matrix

dim(folder_matrix)

folder_matrix[folder_matrix[, 1] == "a1", ]

Package 'kwb.file'

Help Index

Add File Information From File Database

Description

Usage

Arguments

Value

Examples

Copy Files to Flat Structure

Description

Usage

Arguments

Examples

Helper function to return full paths

Description

Usage

Arguments

Examples

Get Full Paths to all XML files Below a Root Folder

Description

Usage

Arguments

Value

Get Default Download Directory

Description

Usage

Value

Examples

Read File Metadata from YAML-File

Description

Usage

Arguments

Value

Remove the Common Root Parts

Description

Usage

Arguments

Examples

"Split Full Paths into Directory Path and Filename"

Description

Usage

Arguments

Value

Examples

Split Full Paths into Root, Folder, File and Extension

Description

Usage

Arguments

Value

Examples

Split Full Paths at Slashes into Parts

Description

Usage

Arguments

Examples

Create Two Table Relational Database From Paths

Description

Usage

Arguments

Value

Examples

Convert Long File Paths to Simple Paths

Description

Usage

Arguments

Value

Examples

Convert a Vector of Paths to a Matrix of Subfolders

Description

Usage

Arguments

Value

Examples