Comparison of organisms: Does phenotypic similarity match phylogenetic

similarity?

Assume you have run Phenotype Microarray experiments for several organisms, e.g. bacterial strains. Assume further that you have numerous metadata for these strains, for example their genetic similarity and their geographical and ecological origins.

The data set wittmann_et_al contains bacterial strains from different phylogenetic clusters (see the respective publication). For each of the strains the geographic and ecological origin is known.

In this example we compare phenotypic to phylogenetic similarity using

* graphical approaches such as heat maps

* bootstrapping to assess significance of phenotypic clusters

* multiple comparison of overall AUC values across phylogenetic clades

Author: Johannes Sikorski

Load R packages and data

library(opm)
library(opmdata)
library(pvclust)
data(wittmann_et_al)

For demonstration purposes, some plates are removed from the data set

wittmann_small <- subset(wittmann_et_al,
  query = list(MLSTcluster = c("Ax1", "Ax2", "Ax4", "Ax6")))

Check the dimensions of the data set:

dim(wittmann_small)
## [1]  33 382  96

Display phenotypic similarity of strains using a heat map approach

heat_map(wittmann_small,
  as.labels = list("strain", "replicate", "MLSTcluster"),
  as.groups = "MLSTcluster",
  cexRow = 1.5,
  use.fun = "gplots",
  main = "Heatmap on AUC data",
  subset = "AUC",
  xlab = "Well substrates on Generation-III Biolog plate",
  ylab = "strains, replicates, and their MLST cluster affiliation")

plot of chunk unnamed-chunk-4

Result

Are these phenotypic similarity clusters statistically robust?

x <- t(extract(wittmann_small, list("strain", "replicate", "MLSTcluster")))
x.pvc <- pvclust(x, method.dist = "euclidean", method.hclust = "ward",
  nboot = 100)
## The "ward" method has been renamed to "ward.D"; note new "ward.D2"
## Bootstrap (r = 0.5)... Done.
## Bootstrap (r = 0.59)... Done.
## Bootstrap (r = 0.7)... Done.
## Bootstrap (r = 0.79)... Done.
## Bootstrap (r = 0.9)... Done.
## Bootstrap (r = 1.0)... Done.
## Bootstrap (r = 1.09)... Done.
## Bootstrap (r = 1.2)... Done.
## Bootstrap (r = 1.29)... Done.
## Bootstrap (r = 1.4)... Done.
plot(x.pvc, hang = -1)
pvrect(x.pvc, max.only = FALSE)

plot of chunk unnamed-chunk-6

Result

According to the AU p-values there is significant support for some of the observed phenotypic similarity clusters (highlighted with rectangles).

Is there any significant difference in overall AUC values across

strains of the phylogenetic clades?

test <- opm_mcp(wittmann_small, model = ~ MLSTcluster, m.type = "aov",
  linfct = c(Tukey = 1))
old.mar <- par(mar = c(3, 15, 3, 2)) # adapt margins in the plot
plot(test)

plot of chunk unnamed-chunk-8

par(old.mar) # reset to default plotting settings

The numerical output of the statistical test is called as follows:

mcp.summary <- summary(test)
mcp.summary$model$call <- NULL # avoid some unnecessary output
mcp.summary
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Linear Hypotheses:
##                Estimate Std. Error t value Pr(>|t|)
## Ax2 - Ax1 == 0   7.6426     6.7838   1.127    0.669
## Ax4 - Ax1 == 0   9.4230     6.7838   1.389    0.501
## Ax6 - Ax1 == 0   6.8870     7.2823   0.946    0.777
## Ax4 - Ax2 == 0   1.7803     4.9542   0.359    0.984
## Ax6 - Ax2 == 0  -0.7557     5.6175  -0.135    0.999
## Ax6 - Ax4 == 0  -2.5360     5.6175  -0.451    0.969
## (Adjusted p values reported -- single-step method)

Result

Synopsis