(According section in the tutorial: 3.4. Integration and manipulation of metadata)
Now we load and attach the opm
functions. Note that if a package such as
opm
has already been loaded, the library
command does nothing. It thus
can be called at any time.
library(pkgutils)
library(opm)
BEGIN code we only need here for users which have no PM data of their own
# Here we check whether a data object `x` has been created by reading PM data
# files. If otherwise, we assign an existing example object to it.
#
if (!exists("x") || length(x) == 0) {
warning("data object 'x' from part 1 is missing or empty, using 'vaas_4'")
x <- as(vaas_4, "MOPMX") # this example object comes with the opm package
metadata(x) <- "ID" # set unique IDs
}
## Warning: data object 'x' from part 1 is missing or empty, using 'vaas_4'
END code we only need here for users which have no PM data of their own
We here check for the availability of metadata in our data object.
A raw representation of the metadata:
metadata(x)
## [[1]]
## [[1]][[1]]
## [[1]][[1]]$Experiment
## [1] "First replicate"
##
## [[1]][[1]]$Species
## [1] "Escherichia coli"
##
## [[1]][[1]]$Strain
## [1] "DSM18039"
##
## [[1]][[1]]$Slot
## [1] "B"
##
## [[1]][[1]]$`Plate number`
## [1] 6
##
## [[1]][[1]]$ID
## [1] 1
##
##
## [[1]][[2]]
## [[1]][[2]]$Experiment
## [1] "First replicate"
##
## [[1]][[2]]$Species
## [1] "Escherichia coli"
##
## [[1]][[2]]$Strain
## [1] "DSM30083T"
##
## [[1]][[2]]$Slot
## [1] "B"
##
## [[1]][[2]]$`Plate number`
## [1] 6
##
## [[1]][[2]]$ID
## [1] 2
##
##
## [[1]][[3]]
## [[1]][[3]]$Experiment
## [1] "First replicate"
##
## [[1]][[3]]$Species
## [1] "Pseudomonas aeruginosa"
##
## [[1]][[3]]$Strain
## [1] "DSM1707"
##
## [[1]][[3]]$Slot
## [1] "B"
##
## [[1]][[3]]$`Plate number`
## [1] 6
##
## [[1]][[3]]$ID
## [1] 3
##
##
## [[1]][[4]]
## [[1]][[4]]$Experiment
## [1] "First replicate"
##
## [[1]][[4]]$Species
## [1] "Pseudomonas aeruginosa"
##
## [[1]][[4]]$Strain
## [1] "429SC1"
##
## [[1]][[4]]$Slot
## [1] "B"
##
## [[1]][[4]]$`Plate number`
## [1] 6
##
## [[1]][[4]]$ID
## [1] 4
A nicer display as data frame, but which might contain gaps:
to_metadata(x)
## Experiment Species Strain Slot
## Gen III.1 First replicate Escherichia coli DSM18039 B
## Gen III.2 First replicate Escherichia coli DSM30083T B
## Gen III.3 First replicate Pseudomonas aeruginosa DSM1707 B
## Gen III.4 First replicate Pseudomonas aeruginosa 429SC1 B
## Plate number ID
## Gen III.1 6 1
## Gen III.2 6 2
## Gen III.3 6 3
## Gen III.4 6 4
Note that a data frame is a kind of object frequently used in R
. It is
like a rectangular matrix, but the columns can contain data of distinct
types (character, numeric, logical etc.).
The set of all metadata entries:
metadata_chars(x, values = TRUE)
## 429SC1 B DSM1707
## "429SC1" "B" "DSM1707"
## DSM18039 DSM30083T Escherichia coli
## "DSM18039" "DSM30083T" "Escherichia coli"
## First replicate Pseudomonas aeruginosa
## "First replicate" "Pseudomonas aeruginosa"
The set of all metadata keys:
metadata_chars(x, values = FALSE)
## Experiment ID Plate number Slot Species
## "Experiment" "ID" "Plate number" "Slot" "Species"
## Strain
## "Strain"
Metadata are not set automatically after reading CSV files. The reason is
that metainformation from CSV files is usually limited and potentially
inconsistent or erroneous, depending on what has been entered at an OmniLog
instrument. It is possible with one line of code, which we show below, to set
a selection of the csv_data
entries as metadata, but whether this makes
sense depends on your input files. The default mechanism adds metadata later
on. Note that the LIMS format yielded more metadata from the beginning.
The most general approach for entering metadata is to generate a template
(file), add information within R
or with an external editor, and then
assign this information into the data object, using the plate identifiers
stored within the template (file).
As an example, we target the metadata entry 'Strain' to either be added
manually or computed from csv_data
entries. We do nothing if it is already
contained.
This code queries the metadata for the presence of a key named 'Strain'. You not normally need that in you code, but we don't know your metadata yet.
if (!all(unlist("Strain" %k% x))) { # Metadata of all plates contain 'Strain'?
# This creates a metadata template file with selected entries.
collect_template(
object = x, # take the 'CSV data' from our data object 'x'
outfile = "template.csv", # place them in that file
previous = NULL, # ignore existing file, if any
selection = c(
opm_opt("csv.selection"), # include the plate identifiers
grep(pattern = "Strain", # and all 'CSV data' entries with 'Strain'
x = colnames(csv_data(x)), value = TRUE)
),
add.cols = "Strain" # add column named 'Strain' with empty values to fill in
)
}
You should now edit the file template.csv
with an editor or a spreadsheet
program, enter the values of interest and save it again, without destroying
the format. In Excel, you might need to use Data -> Text to columns
, using
tabulator as column separator and setting all columns to data type “Text”.
This is explained in detail with screen shots in the tutorial.
if (!all(unlist("Strain" %k% x))) { # Metadata of all plates contain 'Strain'?
# Now we add the information from that file. This has not much use as long as
# we have not filled in our target column 'Strain'.
x <- include_metadata(
object = x, # overwrite existing data object: nothing gets lost
md = "template.csv" # read from that file
)
}
include_metadata
must correctly and uniquely identify plates to correctly
assign the metadata. This cannot work if the identifiers get modified after
exporting them! Potential causes for key-value mismatches mismatches are:
strip.white
has several values that are tried in turn as a remedy.sep
argument tries several values in turn as a remedy.Instead of using external software, you could also edit the metadata directly
in R
. Try:
# x <- edit(x)
If 'Strain' still has not been set but is represented as NA
(the
placeholder in R
for data that are Not Available), we set it to 'unknown'.
The next code snippet shows metadata manipulation using formulas that contain
instructions to be applied to metadata entries. Symbols in these formulas
refer to keys in the metadata.
metadata(x) <- Strain ~ if (is.na(Strain)) "unknown" else Strain
In this way you can conduct any kinds of computations within the metadata themselves.
Adding just a plate ID to each plate that is unique for the current session is easy using the following shortcut:
if (!any(unlist(x %k% "ID"))) { # if there is no ID entry at all, we set it
metadata(x) <- "ID"
}
Using data frames, it is also possible to add metadata if the data frame
has as many rows as we have plates in the PM data object. Note that in
contrast to include_metadata
here plates are identified just by their
position in the data object.
Setting certain csv_data
entries as metadata would work as follows:
## appending
# metadata(x, 1) <- as.data.frame(csv_data(x,
# c("Strain Name", "Strain Number")))
## prepending
# metadata(x, -1) <- as.data.frame(csv_data(x,
# c("Strain Name", "Strain Number")))
## replacing
# metadata(x) <- as.data.frame(csv_data(x, c("Strain Name", "Strain Number")))
(This will not work with your data unless you have 'Strain Name' and 'Strain
Number' in your csv_data
.)
But there is a much easier shortcut:
# metadata(x) <- TRUE # sets the default `csv_data` components as metadata
# metadata(x) <- FALSE # removes the default `csv_data` components from the
# # metadata
Here we check again for the availability of metadata in our data object. Some should be present by now.
A raw representation of the metadata:
metadata(x)
## [[1]]
## [[1]][[1]]
## [[1]][[1]]$Experiment
## [1] "First replicate"
##
## [[1]][[1]]$Species
## [1] "Escherichia coli"
##
## [[1]][[1]]$Strain
## [1] "DSM18039"
##
## [[1]][[1]]$Slot
## [1] "B"
##
## [[1]][[1]]$`Plate number`
## [1] 6
##
## [[1]][[1]]$ID
## [1] 1
##
##
## [[1]][[2]]
## [[1]][[2]]$Experiment
## [1] "First replicate"
##
## [[1]][[2]]$Species
## [1] "Escherichia coli"
##
## [[1]][[2]]$Strain
## [1] "DSM30083T"
##
## [[1]][[2]]$Slot
## [1] "B"
##
## [[1]][[2]]$`Plate number`
## [1] 6
##
## [[1]][[2]]$ID
## [1] 2
##
##
## [[1]][[3]]
## [[1]][[3]]$Experiment
## [1] "First replicate"
##
## [[1]][[3]]$Species
## [1] "Pseudomonas aeruginosa"
##
## [[1]][[3]]$Strain
## [1] "DSM1707"
##
## [[1]][[3]]$Slot
## [1] "B"
##
## [[1]][[3]]$`Plate number`
## [1] 6
##
## [[1]][[3]]$ID
## [1] 3
##
##
## [[1]][[4]]
## [[1]][[4]]$Experiment
## [1] "First replicate"
##
## [[1]][[4]]$Species
## [1] "Pseudomonas aeruginosa"
##
## [[1]][[4]]$Strain
## [1] "429SC1"
##
## [[1]][[4]]$Slot
## [1] "B"
##
## [[1]][[4]]$`Plate number`
## [1] 6
##
## [[1]][[4]]$ID
## [1] 4
A nicer display as data frame, but which might contain gaps:
to_metadata(x)
## Experiment Species Strain Slot
## Gen III.1 First replicate Escherichia coli DSM18039 B
## Gen III.2 First replicate Escherichia coli DSM30083T B
## Gen III.3 First replicate Pseudomonas aeruginosa DSM1707 B
## Gen III.4 First replicate Pseudomonas aeruginosa 429SC1 B
## Plate number ID
## Gen III.1 6 1
## Gen III.2 6 2
## Gen III.3 6 3
## Gen III.4 6 4
The set of all metadata entries:
metadata_chars(x, values = TRUE)
## 429SC1 B DSM1707
## "429SC1" "B" "DSM1707"
## DSM18039 DSM30083T Escherichia coli
## "DSM18039" "DSM30083T" "Escherichia coli"
## First replicate Pseudomonas aeruginosa
## "First replicate" "Pseudomonas aeruginosa"
The set of all metadata keys:
metadata_chars(x, values = FALSE)
## Experiment ID Plate number Slot Species
## "Experiment" "ID" "Plate number" "Slot" "Species"
## Strain
## "Strain"
Do we have 'Strain' entries and, if so, where? Note that %k%
stands for
'key' – here we only search for keys.
x %k% "Strain"
## [[1]]
## [1] TRUE TRUE TRUE TRUE
x %k% "A long key that is quite unlikely to be there!"
## [[1]]
## [1] FALSE FALSE FALSE FALSE
x %k% "ID"
## [[1]]
## [1] TRUE TRUE TRUE TRUE
x %k% list(ID = 9)
## [[1]]
## [1] TRUE TRUE TRUE TRUE
Note that the last query has ignored 9 because for %k%
only the keys are
relevant. Indeed, we have no ID with the value 9:
metadata(x, "ID")
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
Where do we have 'Strain' entries with a certain value? In the following
code %q%
stands for 'query' – here we search for keys that have a certain
value.
x %q% list(Strain = "DSM 917")
## [[1]]
## [1] FALSE FALSE FALSE FALSE
Where do we have 'Strain' entries with several possible values?
x %q% list(Strain = c("unknown", "DSM 1003"))
## [[1]]
## [1] FALSE FALSE FALSE FALSE
Where do we have 'Strain' entries and certain 'ID' entries?
x %q% list(Strain = c("unknown", "DSM 917"), ID = 5)
## [[1]]
## [1] FALSE FALSE FALSE FALSE
Such queries are very useful for selecting plates to specifically analyse them, which we will deal with in a later part of this workshop.
An alternative formula syntax can be used to search in metadata. We suggest to skip this at the beginning. The following examples do the same as their counterparts above:
x %k% ~ Strain
## [[1]]
## [1] TRUE TRUE TRUE TRUE
x %q% ~ Strain == "DSM 917"
## [[1]]
## [1] FALSE FALSE FALSE FALSE
x %q% ~ Strain %in% c("unknown", "DSM 917")
## [[1]]
## [1] FALSE FALSE FALSE FALSE
x %q% ~ Strain %in% c("unknown", "DSM 917") & ID == 5
## [[1]]
## [1] FALSE FALSE FALSE FALSE
It is an error to apply %q%
with a formula to metadata keys that are not
present. These errors can be avoided by using a list on the right-hand side
and by checking with %k%
beforehand which keys are there. Also note that
metadata can be nested.
Now proceed with part 3.