Phylogeny scripts
Scripts designed and implemented by M. Goeker for the
Tuebingen 2007/2008 phylogeny courses.
The scripts are distributed under the terms of the
Gnu Public License V2.
If you use any of them in a publication, please cite this web page.
Sequence format conversion
- gbk2fas.sed: Convert GenBank flatfile to FASTA
format. FASTA sequence labels will consist of organism name and
accession number.
- gbf2fas.awk: Convert FASTA format downloaded from
GenBank into nicer FASTA format. The headers are as above, but the
organism name must be guessed; it is not always included in
GenBank's FASTA titles!
- nex2epf.sed: Convert NEXUS to extended PHYLIP
format. Should work with non-interleaved NEXUS exported by PAUP*.
Data cleaning
- dufa.awk: Remove duplicate sequences from unaligned
FASTA files.
Sequence alignment
- dia2nex.sh: Shell wrapper script for the DIALIGN
alignment program. The most useful feature is the conversion of
DIALIGN's alignment quality scores into characters sets for use
PAUP*. I.e., you can find and automatically exclude the least
reliably aligned columns. Have a look at the
example file. If you use this feature,
please cite that paper.
Warning: The Macintosh version of Dialign seems to contain a
bug. In some sequences, the first character is replaced by the NUL
character.
- gblocks2nexus.sh: Shell wrapper for Gblocks.
Excluded alignment columns are not physically removed from the file,
but excluded using the corresponding PAUP* command in the output
NEXUS file.
- aop.sh: Shell wrapper script for the POA alignment
program. Manages POA runs, automatically provides a scoring matrix,
cleans FASTA files before and after the run, and computes pair-wise
similarity scores with four different methods.
- blosum80_ext_tru.mat: Scoring matrix
used by aop.sh. It is identical to blosum80_trunc.mat distributed
with POA but includes nucleotide ambiguity code.
Working with PAUP*
- nexus.pats: NEXUS syntax highlighting for the Nedit
editor. Includes all commands, command options, and command option
arguments mentioned in the PAUP* 4b10 manual.
- bremer.tcl: Script to compute Bremer and partioned
Bremer support values in conjunction with PAUP*.
Miscellanea
- clean_raxml.sh: Clean up after a RAxML run (get
globally best tree and put best trees from each run in a single file).
- bash.zip: Bash configuration files. Unpack the archive
and put the files in your home directory.