WORKSHOP PART 1: Inputting PM data

(According section in the tutorial: 3.2. Data import)


1.1

How does R work? By executing code, i.e. by calling functions, mostly with some arguments. Code can be commented; everything after # is a comment and will not be executed. You can use this to activate or deactivate code, too:

This will be printed:

message("I like R and opm!")
## I like R and opm!

This would not be printed:

# message("I don't like R and opm!")

Some calculations

5 + 19 # result will be printed to the screen
## [1] 24
result <- 5 + 19 # will not print
result # will print: object `result` now contains 24, assigned to it using `<-`
## [1] 24
5 + 15:20 # what does this do?
## [1] 20 21 22 23 24 25

Note that in R you can pass vectors of length > 1 to most functions, and they will return a vector of the same length.

result # still there
## [1] 24
rm(result) # deleting it

Entering result would now yield Error: object 'result' not found.

Character strings

"I am a character string and will be printed to the screen"
## [1] "I am a character string and will be printed to the screen"

Boolean (logical) values

TRUE # this means, well, true
## [1] TRUE
FALSE # guess what
## [1] FALSE
NA # this means 'NOT AVAILABLE' (missing data)
## [1] NA

Formulas

With left-hand side:

x ~ a * y1 + b * y2 - c * y3 ^ 2
## x ~ a * y1 + b * y2 - c * y3^2
## <environment: 0x3a572e8>

Without left-hand side:

~ a * y1 + b * y2 - c * y3 ^ 2
## ~a * y1 + b * y2 - c * y3^2
## <environment: 0x3a572e8>

Formulas are a special kind of code that is not evaluated but stores symbolic representations to be interpreted later on in a specific way.

Troubleshooting

If anything fails, read the error message!


1.2

Now we load and attach the opm functions. Note that if a package such as opm has already been loaded, the library command does nothing. It thus can be called at any time.

library(pkgutils)
library(opm)

message("Welcome to R and opm!")
## Welcome to R and opm!

Troubleshooting

Note that you can call

#  ?ITEM

at any time to get the documentation for the topic ITEM (replace ITEM by what you are interested in). The following command would list the content of the entire opm documentation:

help(package = "opm")


1.3

Now please make sure you are in the right directory! You can call the getwd command to check that. To change the working directory in RStudio use: Session > Set Working Directory > To Source File Location.

Because it is important, let us issue a warning about the working directory:

warning("the working directory is ", getwd(), " -- correct?")
## Warning: the working directory is
## /home/goeker/Documents/SVN_opm/trunk/opm_doc/demo -- correct?

A warning is not an error; it does not stop the execution of code.


1.4

We assume that only the CSV files within the working directory should be input and that the data read should be grouped by plate type (see below or consult the opm manual or tutorial for what that means). The following code fails if unreadable CSV files are there and not deselected (the include and/or exclude argument would need to be adapted).

x <- read_opm(
  names = getwd(), # search in the working directory for files of interest
  convert = "grp", # group the plates according to the plate type
  include = list("csv"), # search only for CSV files
  exclude = "*template*", # but exclude CSV files that match this pattern
                          # (we later on generate metadata template CSV files)
  demo = FALSE # read the files, do not just show the file names
)

Troubleshooting

If this fails, read the error message! The most usual error we know about is to try to input CSV files with several plates per file, but these would need to be split beforehand, which you can do with opm itself. This works as follows:

split_files(filename, '^("Data File",|Data File)', getwd())

where filename is the name of the file to be split, provided as character string. The newly generated files are numbered accordingly. (They are not named after any metainformation entry because there is no guarantee that it is present.)

It is of course easier to use single-plate instead of multiple-plate files in the first place. How to do this is described in the opm tutorial.

The second frequent kind of error is that you attempt to read files that are CSV but do not contain PM data. This can be fixed as follows:

The opm package understands several styles of CSV and the new LIMS format.


1.5

We now have a look at the resulting object. You do not routinely need all of the following commands; some calls are just for beginners who are curious. Others tell you important features of the generated data object, however.

That command yields MOPMX; if you are curious what that means, consult figure 4 in the tutorial.

class(x)
## [1] "MOPMX"
## attr(,"package")
## [1] "opm"

As often in R, summary is a quite useful function …

summary(x)
## [1] Length      Plate.type  Aggregated  Discretized
## <0 rows> (or 0-length row.names)
## 
## => MOPMX object with 0 element(s), details are shown above.
##  Access the elements with [[ or $ to apply specific methods.

… but in opm you can just enter the name of the object, which also shows the summary:

x
## [1] Length      Plate.type  Aggregated  Discretized
## <0 rows> (or 0-length row.names)
## 
## => MOPMX object with 0 element(s), details are shown above.
##  Access the elements with [[ or $ to apply specific methods.

Show the distinct input plate types:

names(x)
## character(0)

Names could theoretically be missing in such objects, however, but the next function always works to get the plate type:

plate_type(x)
## character(0)

This command checks for aggregated data (a.k.a. curve parameters); must all be FALSE:

has_aggr(x)
## NULL

This command shows the metainformation that was directly found in the CSV files:

csv_data(x)
## NULL

The next R instruction shows the proper metadata (none there yet!):

metadata(x)
## NULL

It makes no sense to proceed if we have not input any data. This happens if no suitable files are found.

length(x) > 0
## [1] FALSE

1.6

If you have only one plate type, you could simplify the object:

# x <- x[[1]] # fetches the 1st element, which belongs to the sole plate type
# class(x) # yielded `OPMS` or `OPM`; if you are curious, consult figure 4 in
#          # the tutorial

But this is not normally necessary.


Now proceed with part 2.