WORKSHOP PART 5: Statistical comparisons of experimental groups

(According section in the tutorial: 3.9. Statistical comparisons of group means)

5.1

Now we load and attach the opm functions. Note that if a package such as opm has already been loaded, the library command does nothing. It thus can be called at any time.

library(pkgutils)
library(opm)

Now we load a new package, which contains additional example data we need here, as we don't know yet how your own data (regarding replicates and metadata) could be used in the following analyses.

library(opmdata)

And another one on which the analysis functions depend.

library(multcomp)

## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data

5.2

Multiple Comparisons of means: The opm_mcp function allows the user to test for differences in the means of multiple groups directly in OPMS objects, obtaining the factors that determine the grouping structure from the stored metadata or the wells.

As we do not know yet how your data are structured, we here switch to a prepared data set. First we subset this large data set vaas_et_al:

vg6 <- vaas_et_al[~ Experiment == "First replicate", , "G06"]

This yields well G06 from all plates that belong to the first biological replicate (there are many technical replicates in vaas_et_al).

Now we conduct a Tukey (all-against-all) comparison of different strains and the same well. The research question is: Which strains show a significant difference in the respiration on well G06, as measured using the A parameter? (one could test the other parameters as well, of course).

tukey.g6 <- opm_mcp(vg6,
  model = ~ Strain, # design the model for model-fitting
  m.type = "aov", # which model type to be fitted
  linfct = c(Tukey = 1) # specification of the linear hypotheses to be tested
)

Plot point estimates and confidence intervals of differences.

old.mar <- par(mar = c(3, 15, 3, 2)) # adapt margins in the plot
plot(tukey.g6)

plot of chunk unnamed-chunk-6

par(old.mar) # reset to default plotting settings

Show p-values of the tests.

summary(tukey.g6)

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = Value ~ Strain, data = list(Strain = c(3L, 3L, 
## 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
## 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L), Parameter = c(1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L), Well = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Value = c(265.494747218717, 
## 262.84394152055, 252.165904613307, 267.907002191215, 258.17582071203, 
## 261.987269757641, 261.907855332673, 259.633238316584, 257.588142152675, 
## 259.376094018848, 281.983280725459, 286.436354057566, 286.657856912123, 
## 294.877361739046, 285.996732858522, 289.867428001859, 286.395969945727, 
## 286.367989656313, 284.219679424904, 290.428056342279, 283.106772145033, 
## 288.399317471037, 277.646203952046, 284.011603823149, 281.790153730848, 
## 289.57953846978, 289.195129782492, 280.384511529562, 287.024440725516, 
## 281.138608272585, 305.710912420542, 303.646930604334, 305.475948948067, 
## 300.030528559448, 293.572693121333, 311.441375705124, 302.607585169851, 
## 310.277995012454, 301.650411405993, 303.136865135743)))
## 
## Linear Hypotheses:
##                           Estimate Std. Error t value Pr(>|t|)    
## DSM1707 - 429SC1 == 0      -19.527      1.938 -10.076   <1e-04 ***
## DSM18039 - 429SC1 == 0     -43.047      1.938 -22.213   <1e-04 ***
## DSM30083T - 429SC1 == 0    -16.432      1.938  -8.479   <1e-04 ***
## DSM18039 - DSM1707 == 0    -23.520      1.938 -12.136   <1e-04 ***
## DSM30083T - DSM1707 == 0     3.095      1.938   1.597    0.393    
## DSM30083T - DSM18039 == 0   26.615      1.938  13.734   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Compare the p-values with the plot: The significant differences can easily be recognised in the plot.

Troubleshooting

Note that you need data sets which actually provide group structures for comparisons. These would be, e.g., strains measured on the same plate type in several repetitions, or one strain distinctly treated and every treatment measured in several repetitions. Of course, it is also possible to compare the wells between each other, when the plates are comparable.

But with a single value per group opm_mcp will inevitably raise an error.

5.3

Another kind of multiple comparisons are Dunnett-type comparisons of wells, i.e. one against the others. (Unfortunately, we have no all for one and one for all type of comparison …)

Prepare a data set with all plates of first replicate from the strain DSM30083T.

dsm1 <- vaas_et_al[list(Experiment = "First replicate", Strain = "DSM30083T")]

Compare each of the first ten wells against the negative control. The research question is: Which of the first nine wells of among the DSM30083T measurements show a significant respiration difference from the negative-control well A01?

mcp.a1 <- opm_mcp(dsm1[, , 1:10],
  model = ~ Well, # design the model for model-fitting
  linfct = c(`Dunnett_A01 (Negative Control)` = 1) # specify the linear
                                                  # hypotheses to be tested
                                                  # and select control group
)

Plot point estimates and confidence intervals of differences.

old.mar <- par(mar = c(3, 20, 3, 2)) # adapt margins in the plot
plot(mcp.a1)

plot of chunk unnamed-chunk-10

par(old.mar) # reset to default plotting settings

Show p-values of the tests.

summary(mcp.a1)

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: User-defined Contrasts
## 
## 
## Fit: glm(formula = Value ~ Well, data = list(Parameter = c(1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L), Well = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
## 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
## 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
## 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
## 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 
## 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 
## 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Value = c(118.832423068396, 
## 113.598496766393, 113.174505366186, 119.431871832165, 122.526461780759, 
## 123.45581090497, 108.998218309557, 126.840184710042, 114.895915863631, 
## 129.765930900503, 246.266350753565, 239.653898014723, 243.317682380835, 
## 244.416712753501, 243.519568687867, 248.180868438754, 241.635814280006, 
## 249.323866334198, 247.657016641196, 248.085915464153, 287.40781815558, 
## 279.368396830807, 286.025118668408, 282.539808641816, 288.697273170112, 
## 284.099383222784, 284.002460316249, 284.659796794183, 286.364787434245, 
## 281.932897929196, 275.709179696123, 274.694869604335, 271.915785019279, 
## 272.911016393365, 279.26889901177, 269.754751084061, 273.542102723189, 
## 274.410175419457, 278.826789244198, 272.192470568112, 175.531652614245, 
## 157.586174305715, 166.452025922472, 169.505090255204, 180.269726746184, 
## 180.753634181345, 176.858671599048, 181.280603928533, 170.42497545551, 
## 176.610789101246, 283.410759212031, 273.079753595165, 282.998241185493, 
## 282.993297533012, 287.401316320437, 287.795945751254, 281.770599446823, 
## 289.030176542383, 280.363285817365, 282.49431828556, 159.88032339611, 
## 144.327306963559, 173.967668633542, 160.53201981799, 161.560156924873, 
## 186.854401478195, 156.60080202946, 168.944017964951, 152.321237989033, 
## 161.001468903409, 135.232631835923, 118.524813603978, 140.9644365373, 
## 140.4515718416, 140.183058372675, 154.397246049565, 130.813237170178, 
## 150.672211003519, 126.370683060521, 137.928166308929, 142.412602515525, 
## 129.576882988226, 146.173312338739, 151.487561964872, 149.08112439553, 
## 159.450856567227, 142.113187226511, 159.111630899083, 141.932599324074, 
## 148.101938540033, 318.50027877219, 327.03734985265, 319.015544055815, 
## 330.170768051129, 319.295518238925, 331.922107693428, 320.606071980105, 
## 332.440569063826, 317.858408095565, 330.442956368927)))
## 
## Linear Hypotheses:
##                                                      Estimate Std. Error
## A02 (Dextrin) - A01 (Negative Control) == 0           126.054      3.213
## A03 (D-Maltose) - A01 (Negative Control) == 0         165.358      3.213
## A04 (D-Trehalose) - A01 (Negative Control) == 0       155.171      3.213
## A05 (D-Cellobiose) - A01 (Negative Control) == 0       54.375      3.213
## A06 (b-Gentiobiose) - A01 (Negative Control) == 0     163.982      3.213
## A07 (Sucrose) - A01 (Negative Control) == 0            43.447      3.213
## A08 (Turanose) - A01 (Negative Control) == 0           18.402      3.213
## A09 (Stachyose) - A01 (Negative Control) == 0          27.792      3.213
## A10 (Positive Control) - A01 (Negative Control) == 0  205.577      3.213
##                                                      z value Pr(>|z|)    
## A02 (Dextrin) - A01 (Negative Control) == 0           39.234   <1e-07 ***
## A03 (D-Maltose) - A01 (Negative Control) == 0         51.468   <1e-07 ***
## A04 (D-Trehalose) - A01 (Negative Control) == 0       48.297   <1e-07 ***
## A05 (D-Cellobiose) - A01 (Negative Control) == 0      16.924   <1e-07 ***
## A06 (b-Gentiobiose) - A01 (Negative Control) == 0     51.039   <1e-07 ***
## A07 (Sucrose) - A01 (Negative Control) == 0           13.523   <1e-07 ***
## A08 (Turanose) - A01 (Negative Control) == 0           5.728   <1e-07 ***
## A09 (Stachyose) - A01 (Negative Control) == 0          8.650   <1e-07 ***
## A10 (Positive Control) - A01 (Negative Control) == 0  63.986   <1e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

5.4

Here we describe a special feature, the choice of a reference group in Dunnett-type comparisons.

The value for the linfct argument can be constructed by typing Dunnett, plus, separated by any sign, e.g. underscore (_), the name of the level which shall serve as the reference group in the contrast set.

Next we conduct a Dunnett-type comparison with well A03 chosen as the reference group. The research question is: Which of the first nine wells of DSM30083T measurements show a significant respiration difference from the well A03?

mcp.a3 <- opm_mcp(dsm1[, , 1:10],
  model = ~ Well,  # design the model for model-fitting
  linfct = c(Dunnett_A03 = 1),  # specification of the linear
                                # hypotheses to be tested
                                # and select control group
  full = FALSE # whether full substrate names shall be used
)

Plot point estimates and confidence intervals of differences.

old.mar <- par(mar = c(3, 5, 3, 2)) # adapt margins in the plot
plot(mcp.a3)

plot of chunk unnamed-chunk-13

par(old.mar) # reset to default plotting settings

Show p-values of the tests.

summary(mcp.a3)

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: User-defined Contrasts
## 
## 
## Fit: glm(formula = Value ~ Well, data = list(Parameter = c(1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
## 1L, 1L), Well = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
## 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
## 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
## 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
## 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 
## 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 
## 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Value = c(118.832423068396, 
## 113.598496766393, 113.174505366186, 119.431871832165, 122.526461780759, 
## 123.45581090497, 108.998218309557, 126.840184710042, 114.895915863631, 
## 129.765930900503, 246.266350753565, 239.653898014723, 243.317682380835, 
## 244.416712753501, 243.519568687867, 248.180868438754, 241.635814280006, 
## 249.323866334198, 247.657016641196, 248.085915464153, 287.40781815558, 
## 279.368396830807, 286.025118668408, 282.539808641816, 288.697273170112, 
## 284.099383222784, 284.002460316249, 284.659796794183, 286.364787434245, 
## 281.932897929196, 275.709179696123, 274.694869604335, 271.915785019279, 
## 272.911016393365, 279.26889901177, 269.754751084061, 273.542102723189, 
## 274.410175419457, 278.826789244198, 272.192470568112, 175.531652614245, 
## 157.586174305715, 166.452025922472, 169.505090255204, 180.269726746184, 
## 180.753634181345, 176.858671599048, 181.280603928533, 170.42497545551, 
## 176.610789101246, 283.410759212031, 273.079753595165, 282.998241185493, 
## 282.993297533012, 287.401316320437, 287.795945751254, 281.770599446823, 
## 289.030176542383, 280.363285817365, 282.49431828556, 159.88032339611, 
## 144.327306963559, 173.967668633542, 160.53201981799, 161.560156924873, 
## 186.854401478195, 156.60080202946, 168.944017964951, 152.321237989033, 
## 161.001468903409, 135.232631835923, 118.524813603978, 140.9644365373, 
## 140.4515718416, 140.183058372675, 154.397246049565, 130.813237170178, 
## 150.672211003519, 126.370683060521, 137.928166308929, 142.412602515525, 
## 129.576882988226, 146.173312338739, 151.487561964872, 149.08112439553, 
## 159.450856567227, 142.113187226511, 159.111630899083, 141.932599324074, 
## 148.101938540033, 318.50027877219, 327.03734985265, 319.015544055815, 
## 330.170768051129, 319.295518238925, 331.922107693428, 320.606071980105, 
## 332.440569063826, 317.858408095565, 330.442956368927)))
## 
## Linear Hypotheses:
##                Estimate Std. Error z value Pr(>|z|)    
## A01 - A03 == 0 -165.358      3.213 -51.468   <0.001 ***
## A02 - A03 == 0  -39.304      3.213 -12.233   <0.001 ***
## A04 - A03 == 0  -10.187      3.213  -3.171   0.0117 *  
## A05 - A03 == 0 -110.982      3.213 -34.543   <0.001 ***
## A06 - A03 == 0   -1.376      3.213  -0.428   0.9997    
## A07 - A03 == 0 -121.911      3.213 -37.945   <0.001 ***
## A08 - A03 == 0 -146.956      3.213 -45.740   <0.001 ***
## A09 - A03 == 0 -137.566      3.213 -42.817   <0.001 ***
## A10 - A03 == 0   40.219      3.213  12.518   <0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

There are many more types of comparisons possible, but the more complicated ones also need more elaborate preparation. For ways to do this in opm see the tutorial, section 3.9.

The code prepared for the Florence 2015 opm workshop ends here. For proceeding with opm, consult the tutorial and the manual. There is much more you can do with opm.