Analysing phenotype microarray data: clustering with p-values

Using pvclust and opm to calculate a clustering with branch support values. Assessing the uncertainty of clustering results should be done for all their serious interpretations. This is one approach to assessing uncertainty; see the pvclust documentation for details.

Author: Markus Goeker

library(pvclust)
library(opm)

Generating the data matrix to be clustered

The matrix must be transposed: pvclust expects the objects to be clustered in the columns. See pvclust::pvclust and base::t for details. If you need more space for the clustering, generate shorter labels from the metadata.

x <- t(extract(vaas_4, list("Species", "Strain")))

Clustering the data

The default distance method is based on correlations, which makes not much sense for the usual application of PM data, at least not for vaas_4. (It could be preferable to Euclidean distances if the overall reaction levels were different between organisms because of experimental artefacts.) Several other distance measures could be applicable, however, as well as clustering algorithms other than the default average-linkage clustering.

x.pvc <- pvclust(x, method.dist = "euclidean")
## Bootstrap (r = 0.5)... Done.
## Bootstrap (r = 0.59)... Done.
## Bootstrap (r = 0.7)... Done.
## Bootstrap (r = 0.79)... Done.
## Bootstrap (r = 0.9)... Done.
## Bootstrap (r = 1.0)... Done.
## Bootstrap (r = 1.09)... Done.
## Bootstrap (r = 1.2)... Done.
## Bootstrap (r = 1.29)... Done.
## Bootstrap (r = 1.4)... Done.

Plotting the clustering

The two kinds of support values are visible on the branches. We see that E. coli is comparatively well differentiated from P. aeruginosa, but only if standard bootstrapping is considered. Calling pvclust::pvrect highlights clusters with high support.

plot(x.pvc, hang = -1)
pvrect(x.pvc, pv = "bp")

plot of chunk unnamed-chunk-4

For real analyses, omit pv = "bp".

Tidying up

detach("package:pvclust", unload = TRUE)