Invert the Merging of two Data Frames
Description
Split a data frame z
into two data frames x
and y
so
that merge(x, y)
is z
.
Usage
unmerge(z, by)
unmerge(z, by)
Arguments
z |
data frame
|
by |
vector of names of columns in z that are used to build
groups of rows of z so that within each group the values in these
columns do not change. For each group the columns being constant over all
rows are identified. Columns that are constant in each group will appear in
the data frame x whereas the remaining columns will appear in the
data frame y of the returned list.
|
Value
list with two elements x
and y
each of which are data
frames containing at least the columns given in by
.
Examples
z <- data.frame(
name = c("peter", "peter", "paul", "mary", "paul", "mary"),
age = c(42, 42, 31, 28, 31, 28),
height = c(181, 181, 178, 172, 178, 172),
subject = c("maths", "bio", "bio", "bio", "chem", "maths"),
year = c(2016, 2017, 2017, 2017, 2015, 2016),
mark = c("A", "B", "B", "A", "C", "b")
)
# What fields seem to be properties of objects identified by name?
# -> Age and height are fix properties of the persons identified by name
(result1 <- unmerge(z, "name"))
# What fields seem to be properties of objects identified by subject?
# -> It seems that the subjects have been tested in different years
(result2 <- unmerge(z, "subject"))
# Test if merge(result$x, result$y) results in z
y1 <- merge(result1$x, result1$y)
y2 <- merge(result2$x, result2$y)
columns <- sort(names(z))
identical(fullySorted(z[, columns]), fullySorted(y1[, columns])) # TRUE
identical(fullySorted(z[, columns]), fullySorted(y2[, columns])) # TRUE
z <- data.frame(
name = c("peter", "peter", "paul", "mary", "paul", "mary"),
age = c(42, 42, 31, 28, 31, 28),
height = c(181, 181, 178, 172, 178, 172),
subject = c("maths", "bio", "bio", "bio", "chem", "maths"),
year = c(2016, 2017, 2017, 2017, 2015, 2016),
mark = c("A", "B", "B", "A", "C", "b")
)
(result1 <- unmerge(z, "name"))
(result2 <- unmerge(z, "subject"))
y1 <- merge(result1$x, result1$y)
y2 <- merge(result2$x, result2$y)
columns <- sort(names(z))
identical(fullySorted(z[, columns]), fullySorted(y1[, columns]))
identical(fullySorted(z[, columns]), fullySorted(y2[, columns]))