Distance programs
Programs designed and (except for dist.pl) implemented M. Goeker for conversion
between (host and associate) character and distance data and related tasks. If
you use any of the programs in a publication, please cite this web page. Each
of the zip files with software contains three executables (for Linux, MacOS10,
and Windows).
Data and distance formats
- Examples for the formats used by the programs
are provided. As a rule, the programs listed below
are able to print their output in the popular NEXUS format used by
programs such as PAUP* and
MacClade. However, they are not
able to read NEXUS-formatted files. The readme.txt file distributed
together with the examples describes how to export data from PAUP* in
a legible manner.
Computation of distance quality statistics
- dist_stats: Computes distance quality scores for
both entire matrices and indivdual taxa. Some of the scores have not
been thoroughly tested; it is thus strongly recommended to rely on
Delta values only, which have first been described by
Holland et al..
However, almost all scores are described in our
poster
that also assesses the relative performance of the scores with two
whole organelle genome datasets.
Auch et al. and
Goeker and Grimm have
applied these scores to determine the best distance functions.
- model2dist.nex: Batch file for PAUP* to to compute
distance matrices along with likelihood scores. Whereas the scores can
be input in Posada's famous modeltest
program, delta values can be computed from the distance matrices
with dist_stats as an alternative means to find the best model for
phylogenetic analysis based on pairwise distances (e.g., Neighbour
Joining). This technique was suggested by
Holland et al.
and was applied by
Goeker et al..
However, in contrast to the application of modeltest, performance of
that model selection method has so far not been examined by extensive
simulation or empirical studies.
- runds.sh: Shell script facilitating the analysis of the
model2dist.nex output.
Computation of distances
- eukdis: Computes distances from different kinds of
character data with different formulae (default is Euclidean distances
from quantitative characters). An older version has
been used by
Goeker and Grimm.
- logdis: A distance conversion tool.
- pbc: Computation of host distances from associate
distances using two different methods, which are described in
Goeker and Grimm.
Have a look at gbk2fas for the extraction of
host-associate information from Genbank flatfiles.
- dist.pl: Perl script designed by M. Goeker and
implemented by C. Oertel for computation of patristic (path-length)
distances from trees and for averaging of distance matrices.
- newick.tcl: Tcl script that extracts different types of
information from trees. Can also be used to calculate patristic
distances.
Conversion of characters
- g2cef: Computation of host characters from associate
characters using five different methods, which are described in
Goeker and Grimm.
The eukdis program (see above) can be
used to compute distances from the converted character matrices.
Have a look at gbk2fas for the extraction of
host-associate information from Genbank flatfiles.
- perma: Simple program to randomly permute columns of
a tab-separated text file a specified number of times. Can be used in
a UNIX pipeline to calculate permutational statistics.
Questions?
In case you have any questions regarding these programs, send an e-mail
to support [at] goeker [dot] org. Please include the name of the program
somewhere in the subject.