R: Extract aggregated values and/or metadata

extract {opm}

R Documentation

Extract aggregated values and/or metadata

Description

Extract selected aggregated and/or discretised values into common matrix or data frame. The extract data-frame method conducts normalisation and/or computes normalised point-estimates and respective confidence intervals for user-defined experimental groups. It is mainly a helper function for ci_plot. extract_columns extracts only selected metadata entries for use as additional columns in a data frame or (after joining) as character vector with labels.

Usage

  ## S4 method for signature 'MOPMX'
extract(object, as.labels,
    subset = opm_opt("curve.param"), ci = FALSE, trim = "full",
    dataframe = FALSE, as.groups = NULL, sep = " ", ...) 
  ## S4 method for signature 'OPMS'
extract(object, as.labels,
    subset = opm_opt("curve.param"), ci = FALSE, trim = "full",
    dataframe = FALSE, as.groups = NULL, sep = " ", dups = "warn",
    exact = TRUE, strict = TRUE, full = TRUE, max = 10000L, ...) 
  ## S4 method for signature 'data.frame'
extract(object, as.groups = TRUE,
    norm.per = c("row", "column", "none"), norm.by = TRUE, subtract = TRUE,
    direct = inherits(norm.by, "AsIs"), dups = c("warn", "error", "ignore"),
    split.at = param_names("split.at")) 

  ## S4 method for signature 'WMD'
extract_columns(object, what, join = FALSE,
    sep = " ", dups = c("warn", "error", "ignore"), factors = TRUE,
    exact = TRUE, strict = TRUE) 
  ## S4 method for signature 'WMDS'
extract_columns(object, what, join = FALSE,
    sep = " ", dups = c("warn", "error", "ignore"), factors = TRUE,
    exact = TRUE, strict = TRUE) 
  ## S4 method for signature 'data.frame'
extract_columns(object, what,
    as.labels = NULL, as.groups = NULL, sep = opm_opt("comb.value.join"),
    factors = is.list(what), direct = is.list(what) || inherits(what, "AsIs"))

Arguments

`object`	`OPMS` object, `MOPMX` object or data frame, for `extract` with one column named as indicated by `split.at` (default given by `param_names("split.at")`), columns with factor variables before that column and columns with numeric vectors after that column. For `extract_columns` optionally an `OPM` object.
`as.labels`	List, character vector or formula indicating the metadata to be joined and used as row names (if `dataframe` is `FALSE`) or additional columns (if otherwise). Ignored if `NULL`. If a `as.labels` is a formula and `dataframe` is `TRUE`, the pseudo-function `J` within the formula can be used to trigger combination of factors immediately after selecting them as data-frame columns, much like `as.groups`.
`subset`	Character vector. The parameter(s) to put in the matrix. One of the values of `param_names()`. Alternatively, if it is `param_names("disc.name")`, discretised data are returned, and `ci` is ignored. Can also be identical to `param_names("hours")`, which yields the overall running time (see `hours`), also ignoring `ci`.
`ci`	Logical scalar. Also return the confidence intervals?
`trim`	Character scalar. See `aggregated` for details.
`dataframe`	Logical scalar. Return data frame or matrix? In the case of the `MOPMX` method this can also be `NA` and then behaves like `TRUE` but ensures that all rows are kept.
`as.groups`	For the `OPMS` method, a list, character vector or formula indicating the metadata to be joined and either used as ‘row.groups’ attribute of the output matrix or as additional columns of the output data frame. See `heat_map` for its usage. Ignored if empty. If a `as.groups` is a formula and `dataframe` is `TRUE`, the pseudo-function `J` within the formula can be used to trigger combination of factors immediately after selecting them as data-frame columns, much like `as.labels`. If `as.groups` is a logical scalar, `TRUE` yields a trivial group that contains all elements, `FALSE` yields one group per element, and `NA` yields an error. The column name in which this factor is placed if `dataframe` is `TRUE` is determined using `opm_opt("group.name")`. For the data-frame method, a logical, character or numeric vector indicating according to which columns (before the `split.at` column) the data should be aggregated by calculating means and confidence intervals. If `FALSE`, such an aggregation does not take place. If `TRUE`, all those columns are used for grouping.
`sep`	Character scalar. Used as separator between the distinct metadata entries if these are to be pasted together. `extract_columns` ignores this unless `join` is `TRUE`. The data-frame method always joins the data unless `what` is a list.
`dups`	Character scalar specifying what to do in the case of duplicate labels: either ‘warn’, ‘error’ or ‘ignore’. Ignored unless `join` is `TRUE` and if `object` is an `OPM` object. For the data-frame method of `extract`, a character scalar defining the action to conduct if `as.groups` contains duplicates.
`exact`	Logical scalar. Passed to `metadata`.
`strict`	Logical scalar. Also passed to `metadata`.
`full`	Logical scalar indicating whether full substrate names shall be used. This is passed to `wells`, but in contrast to what `flatten` is doing the argument here refers to the generation of the column names.
`max`	Numeric scalar. Passed to `wells`.
`...`	Optional other arguments passed to `wells`.
`norm.per`	Character scalar indicating the presence and direction of a normalisation step. none No normalisation. row Normalisation per row. By default, this would subtract the mean of each plate from each of its values (over all wells of that plate). column Normalisation per column. By default, this would subtract the mean of each well from each of its values (over all plates in which this well is present). This step can further by modified by the next three arguments.
`norm.by`	Vector indicating which wells (columns) or plates (rows) are used to calculate means used for the normalisation. By default, the mean is calculated over all rows or columns if normalisation is requested using `norm.per`. But if `direct` is `TRUE`, `norm.by` is directly interpreted as numeric vector used for normalisation.
`direct`	Logical scalar. For `extract`, indicating how to use `norm.by`. See there for details. For `extract_columns`, indicating whether to extract column names directly, or search for columns of one to several given classes.
`subtract`	Logical scalar indicating whether normalisation (if any) is done by subtracting or dividing.
`split.at`	Character vector defining alternative names of the column at which the data frame shall be divided. Exactly one must match.
`what`	For the `OPMS` method, a list of metadata keys to consider, or single such key; passed to `metadata`. A formula is also possible; see there for details. A peculiarity of `extract_columns` is that including `J` as a pseudo-function call in the formula triggers the combination of metadata entries to new factors immediately after selecting them, as long as `join` is `FALSE`. For the data-frame method, just the names of the columns to extract, or their indexes, as vector, if `direct` is `TRUE`. Alternatively, the name of the class to extract from the data frame to form the matrix values. In the ‘direct’ mode, `what` can also be a named list of vectors used for indexing. In that case a data frame is returned that contains the columns from `object` together with new columns that result from pasting the selected columns together. If `what` is named, its names are used as the new column names. Otherwise each name is created by joining the respective value within `what` with the `"comb.key.join"` entry of `opm_opt` as separator.
`join`	Logical scalar. Join each row together to yield a character vector? Otherwise it is just attempted to construct a data frame.
`factors`	Logical scalar determining whether strings should be converted to factors. Note that this would only affect newly created data-frame columns.

Details

extract_columns is not normally directly called by an opm user because extract is available, which uses this function, but can be used for testing the applied metadata selections beforehand.

The extract_columns data-frame method is partially trivial (extract the selected columns and join them to form a character vector or new data-frame columns), partially more useful (extract columns with data of a specified class).

Not all MOPMX objects are suitable for extract. The call will be successful if only OPMS objects are contained, i.e. OPM objects are forbidden. But even if successful it might result in NA values within the resulting matrix or data frame. This may cause methods that call extract to fail. NA values will not occur if the set of row names created using as.labels is equal between the distinct elements of object. The also holds if dataframe is TRUE, even though in that case row names are only temporarily created.

Duplicate combinations of row and columns names currently cause the MOPMX methods to skip all of them except the last one if dataframe is FALSE. This should mainly effect substrates that occur in plates of distinct plate types.

Similarly, duplicate row names will cause the skipping of all but the last one. This can be circumvented by using an as.labels argument that yields unique row names. If as.labels is empty, the MOPMX method of extract will create potentially unique row names from the names if these are present but from the plate types if the ‘names’ attribute is NULL. This will not be done, and rows will neither be skipped nor reordered, if dataframe is TRUE.

Otherwise row names and names of substrate columns will be reordered (sorted). The created ‘row.groups’ attribute, if any, will be adapted accordingly. If dataframe is TRUE, the placement of the columns created by as.groups will also be as usual, but duplicates, if any, will be removed.

Value

Numeric matrix or data frame from extract; always a data frame for the data-frame method with the same column structure as object and, if grouping was used, a triplet structure of the rows, as indicated in the new split.at column: (i) group mean, (ii) lower and (iii) upper boundary of the group confidence interval. The data could then be visualised using ci_plot. See the examples.

For the OPMS method of extract_columns, a data frame or character vector, depending on the join argument. The data-frame method of extract_columns returns a character vector or a data frame, too, but depending on the what argument.

Author(s)

Lea A.I. Vaas, Markus Goeker

Examples

## 'OPMS' method
opm_opt("curve.param") # default parameter

## [1] "A"

# generate matrix (containing the parameter given above)
(x <- extract(vaas_4, as.labels = list("Species", "Strain")))[, 1:3]

##                                A01 (Negative Control) A02 (Dextrin)
## Escherichia coli DSM18039                    57.66618     131.67996
## Escherichia coli DSM30083T                  123.45581     248.18087
## Pseudomonas aeruginosa DSM1707               61.35526      75.10225
## Pseudomonas aeruginosa 429SC1                55.74738      66.05093
##                                A03 (D-Maltose)
## Escherichia coli DSM18039             42.45742
## Escherichia coli DSM30083T           284.09938
## Pseudomonas aeruginosa DSM1707        22.37216
## Pseudomonas aeruginosa 429SC1         49.63049

stopifnot(is.matrix(x), dim(x) == c(4, 96), is.numeric(x))
# using a formula also works
(y <- extract(vaas_4, as.labels = ~ Species + Strain))[, 1:3]

##                                A01 (Negative Control) A02 (Dextrin)
## Escherichia coli DSM18039                    57.66618     131.67996
## Escherichia coli DSM30083T                  123.45581     248.18087
## Pseudomonas aeruginosa DSM1707               61.35526      75.10225
## Pseudomonas aeruginosa 429SC1                55.74738      66.05093
##                                A03 (D-Maltose)
## Escherichia coli DSM18039             42.45742
## Escherichia coli DSM30083T           284.09938
## Pseudomonas aeruginosa DSM1707        22.37216
## Pseudomonas aeruginosa 429SC1         49.63049

stopifnot(identical(x, y))

# generate data frame
(x <- extract(vaas_4, as.labels = list("Species", "Strain"),
  dataframe = TRUE))[, 1:3]

##                  Species    Strain Parameter
## 1       Escherichia coli  DSM18039         A
## 2       Escherichia coli DSM30083T         A
## 3 Pseudomonas aeruginosa   DSM1707         A
## 4 Pseudomonas aeruginosa    429SC1         A

stopifnot(is.data.frame(x), dim(x) == c(4, 99))
# using a formula
(y <- extract(vaas_4, as.labels = ~ Species + Strain,
  dataframe = TRUE))[, 1:3]

##                  Species    Strain Parameter
## 1       Escherichia coli  DSM18039         A
## 2       Escherichia coli DSM30083T         A
## 3 Pseudomonas aeruginosa   DSM1707         A
## 4 Pseudomonas aeruginosa    429SC1         A

stopifnot(identical(x, y))
# using a formula, with joining into new columns
(y <- extract(vaas_4, as.labels = ~ J(Species + Strain),
  dataframe = TRUE))[, 1:3]

##                  Species    Strain                 Species.Strain
## 1       Escherichia coli  DSM18039      Escherichia coli/DSM18039
## 2       Escherichia coli DSM30083T     Escherichia coli/DSM30083T
## 3 Pseudomonas aeruginosa   DSM1707 Pseudomonas aeruginosa/DSM1707
## 4 Pseudomonas aeruginosa    429SC1  Pseudomonas aeruginosa/429SC1

stopifnot(identical(x, y[, -3]))

# put all parameters in a single data frame
x <- lapply(param_names(), function(name) extract(vaas_4, subset = name,
  as.labels = list("Species", "Strain"), dataframe = TRUE))
x <- do.call(rbind, x)

# get discretised data
(x <- extract(vaas_4, subset = param_names("disc.name"),
  as.labels = list("Strain")))[, 1:3]

##           A01 (Negative Control) A02 (Dextrin) A03 (D-Maltose)
## DSM18039                   FALSE            NA           FALSE
## DSM30083T                     NA          TRUE            TRUE
## DSM1707                    FALSE         FALSE           FALSE
## 429SC1                     FALSE         FALSE           FALSE

stopifnot(is.matrix(x), identical(dim(x), c(4L, 96L)), is.logical(x))

## data-frame method

# extract data from OPMS-object as primary data frame
# second call to extract() then applied to this one
(x <- extract(vaas_4, as.labels = list("Species", "Strain"),
  dataframe = TRUE))[, 1:3]

##                  Species    Strain Parameter
## 1       Escherichia coli  DSM18039         A
## 2       Escherichia coli DSM30083T         A
## 3 Pseudomonas aeruginosa   DSM1707         A
## 4 Pseudomonas aeruginosa    429SC1         A

# no normalisation, but grouping for 'Species'
y <- extract(x, as.groups = "Species", norm.per = "none")
# plotting using ci_plot()
ci_plot(y[, c(1:6, 12)], legend.field = NULL, x = 350, y = 1)

plot of chunk unnamed-chunk-1

# normalisation by plate means
y <- extract(x, as.groups = "Species", norm.per = "row")
# plotting using ci_plot()
ci_plot(y[, c(1:6, 12)], legend.field = NULL, x = 130, y = 1)

plot of chunk unnamed-chunk-1

# normalisation by well means
y <- extract(x, as.groups = "Species", norm.per = "column")
# plotting using ci_plot()
ci_plot(y[, c(1:6, 12)], legend.field = NULL, x = 20, y = 1)

plot of chunk unnamed-chunk-1

# normalisation by subtraction of the well means of well A10 only
y <- extract(x, as.groups = "Species", norm.per = "row", norm.by = 10,
  subtract = TRUE)
# plotting using ci_plot()
ci_plot(y[, c(1:6, 12)], legend.field = NULL, x = 0, y = 0)

plot of chunk unnamed-chunk-1

## extract_columns()

# 'OPMS' method

# Create data frame
(x <- extract_columns(vaas_4, what = list("Species", "Strain")))

##                  Species    Strain
## 1       Escherichia coli  DSM18039
## 2       Escherichia coli DSM30083T
## 3 Pseudomonas aeruginosa   DSM1707
## 4 Pseudomonas aeruginosa    429SC1

stopifnot(is.data.frame(x), dim(x) == c(4, 2))
(y <- extract_columns(vaas_4, what = ~ Species + Strain))

##                  Species    Strain
## 1       Escherichia coli  DSM18039
## 2       Escherichia coli DSM30083T
## 3 Pseudomonas aeruginosa   DSM1707
## 4 Pseudomonas aeruginosa    429SC1

stopifnot(identical(x, y)) # same result using a formula
(y <- extract_columns(vaas_4, what = ~ J(Species + Strain)))

##                  Species    Strain                 Species.Strain
## 1       Escherichia coli  DSM18039      Escherichia coli/DSM18039
## 2       Escherichia coli DSM30083T     Escherichia coli/DSM30083T
## 3 Pseudomonas aeruginosa   DSM1707 Pseudomonas aeruginosa/DSM1707
## 4 Pseudomonas aeruginosa    429SC1  Pseudomonas aeruginosa/429SC1

stopifnot(is.data.frame(y), dim(y) == c(4, 3)) # additional column created
stopifnot(identical(x, y[, -3]))
(x <- extract_columns(vaas_4, what = TRUE)) # use logical scalar

##   Group
## 1     1
## 2     1
## 3     1
## 4     1

stopifnot(is.data.frame(x), dim(x) == c(4, 1))
(y <- extract_columns(vaas_4, what = FALSE))

##   Group
## 1     1
## 2     2
## 3     3
## 4     4

stopifnot(is.data.frame(y), dim(y) == c(4, 1), !all(y[, 1] == x[, 1]))

# Create a character vector
(x <- extract_columns(vaas_4, what = list("Species", "Strain"), join = TRUE))

## [1] "Escherichia coli DSM18039"      "Escherichia coli DSM30083T"    
## [3] "Pseudomonas aeruginosa DSM1707" "Pseudomonas aeruginosa 429SC1"

stopifnot(is.character(x), length(x) == 4L)
(x <- try(extract_columns(vaas_4, what = list("Species"), join = TRUE,
  dups = "error"), silent = TRUE)) # duplicates yield error

## [1] "Error in .local(object, ...) : duplicated label: Escherichia coli\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in .local(object, ...): duplicated label: Escherichia coli>

stopifnot(inherits(x, "try-error"))
(x <- try(extract_columns(vaas_4, what = list("Species"), join = TRUE,
  dups = "warn"), silent = TRUE)) # duplicates yield warning only

## Warning in .local(object, ...): duplicated label: Escherichia coli

## [1] "Escherichia coli"       "Escherichia coli"      
## [3] "Pseudomonas aeruginosa" "Pseudomonas aeruginosa"

stopifnot(is.character(x), length(x) == 4L)

# data-frame method, 'direct' running mode
x <- data.frame(a = 1:26, b = letters, c = LETTERS)
(y <- extract_columns(x, I(c("a", "b")), sep = "-"))

##  [1] "1-a"  "2-b"  "3-c"  "4-d"  "5-e"  "6-f"  "7-g"  "8-h"  "9-i"  "10-j"
## [11] "11-k" "12-l" "13-m" "14-n" "15-o" "16-p" "17-q" "18-r" "19-s" "20-t"
## [21] "21-u" "22-v" "23-w" "24-x" "25-y" "26-z"

stopifnot(grepl("^\\s*\\d+-[a-z]$", y)) # pasted columns 'a' and 'b'

# data-frame method, using class name
(y <- extract_columns(x, as.labels = "b", what = "integer", as.groups = "c"))

##    a
## a  1
## b  2
## c  3
## d  4
## e  5
## f  6
## g  7
## h  8
## i  9
## j 10
## k 11
## l 12
## m 13
## n 14
## o 15
## p 16
## q 17
## r 18
## s 19
## t 20
## u 21
## v 22
## w 23
## x 24
## y 25
## z 26
## attr(,"row.groups")
##  [1] A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
## Levels: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

stopifnot(is.matrix(y), dim(y) == c(26, 1), rownames(y) == x$b)
stopifnot(identical(attr(y, "row.groups"), x$c))

[Package opm version 1.3.63 Index]

Extract aggregated values and/or metadata

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples