Table of Contents
List of Tables
List of Examples
fastphylo is software project containing the implementations of the algorithms "Fast Computation of Distance Estimators" and "Fast Neighbor Joining". The software is published in the BMC Bioinformatics journal in 2013 and is licensed under the MIT license.
The primary URL for this document is http://fastphylo.sourceforge.net.
BibTex
	@Article{24255987,
	AUTHOR = {Khan, Mehmood and Elias, Isaac and Sjolund, Erik and Nylander, Kristina and Guimera, 
	Roman and Schobesberger, Richard and Schmitzberger, Peter and Lagergren, Jens and Arvestad, Lars},
	TITLE = {Fastphylo: Fast tools for phylogenetics},
	JOURNAL = {BMC Bioinformatics},
	VOLUME = {14},
	YEAR = {2013},
	NUMBER = {1},
	PAGES = {334},
	URL = {http://www.biomedcentral.com/1471-2105/14/334},
	DOI = {10.1186/1471-2105-14-334},
	PubMedID = {24255987},
	ISSN = {1471-2105},
	}
	
Isaac Elias and Jens Lagergren published the algorithm in the journal BMC Bioinformatics in 2007.
BibTex
@Article{EliasLagergren_fastdist,
  author =      {Isaac Elias and Jens Lagergren},
  title =	{Fast Computation of Distance Estimators},
  journal =	{BMC Bioinformatics},
  year =        {2007},
  pages =       {89},
  volume =      {8}
}
Background: Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n3). Unfortunately, the fastest practical algorithms known for computing the distance matrix, from n sequences of length l, takes time proportional to l·n2. Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications.
Results: We give an advanced algorithm for computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity symbols. This new method is shown to be more accurate as well as faster than earlier methods.
Conclusions: Our novel algorithm for computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds.
Supplementary Material - Fast Computation of Distance Estimators. Contains additional figures for the tests run on the ambiguity approaches. (PDF)
Simulated Test Data for Ambiguities (Tar archive)
Biological Test Data (Tar archive)
Command file used for running Paup (Nexus file)
Isaac Elias and Jens Lagergren published the algorithm in the book "Proc. of the 32nd International Colloquium on Automata, Languages and Programming ({ICALP}'05)" in 2005.
BibTex
@InProceedings{ICALP05:EliasLagergren_FNJ,
  author =      {Isaac Elias and Jens Lagergren},
  title =	{Fast Neighbor Joining},
  booktitle =	{Proc. of the 32nd International Colloquium on Automata, 
                Languages and Programming ({ICALP}'05)},
  pages =	{1263--1274},
  year =	{2005},
  volume =	{3580},
  series =	{Lecture Notes in Computer Science},
  month =	{July},
  publisher =	{Springer-Verlag},
  ISBN =	{3-540-27580-0},
}
Reconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor Joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. It takes the distances between n taxa and produces in Θ(n3) time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius.
The contribution of this paper is twofold: (1) we present an algorithm called Fast Neighbor Joining (FNJ) with optimal reconstruction radius and optimal run time complexity O(n2) and (2) we present a greatly simplified proof for the correctness of NJ. Initial experiments show that FNJ in practice has almost the same accuracy as NJ, indicating that the property of optimal reconstruction radius has great importance to their good performance. Moreover, we show how improved running time can be achieved for computing the so-called correction formulas.
Download the software from the sourceforge project page. The latest version of fastphylo is 1.0.1.
To install fastphylo on Ubuntu or Debian, first download the fastphylo-1.0.1.deb and then log in as root and
# dpkg -i fastphylo-1.0.1.deb
To install fastphylo on Centos or Debian, first download the fastphylo-1.0.1.Linux.rpm and then log in as root and
# yum localinstall fastphylo-1.0.1.Linux.rpm
To install fastphylo on a Mac OS X v10.6.8 (Snow Leopard ) on a Mac computer with Intel cpu, first download the fastphylo-1.0.0-MacOSX10.5.tar.gz and then
$ tar xfz fastphylo-1.0.0-MacOSX10.6.8.tar.gz
To install fastphylo on a Mac OS X v10.4 ( Tiger ) on a Mac computer with Intel cpu, first download the fastphylo-1.0.0-MacOSX10.4.tar.gz and then
$ tar xfz fastphylo-1.0.0-MacOSX10.4.tar.gz
To build fastphylo on Unix ( e.g. Linux, MacOSX ) you need to have this installed
For Ubuntu OS, you can install the above pre-requists using the following commands:
sudo apt-get install cmake sudo apt-get install libxml2 libxml2-dev sudo apt-get install -y autotools-dev g++ build-essential openmpi1.6 libopenmpi1.6-dbg sudo apt-get install libcr-dev mpich2 mpich2-doc sudo apt-get install libblas-dev libblas-doc liblapack-dev liblapack-doc
You can download the source code using svn:
svn checkout svn://svn.code.sf.net/p/fastphylo/code/trunk fastphylo-code
If you have the fastphylo source code in the directory /tmp/fastphylo and you want to install fastphylo into the directory /tmp/install, you 
First run cmake then make and then make install
$ mkdir /tmp/build $ cd /tmp/build $ cmake -DCMAKE_INSTALL_PREFIX=/tmp/install /tmp/source && make && make install -- A library with BLAS API found. -- A library with BLAS API found. -- A library with LAPACK API found. -- A library with BLAS API found. -- A library with BLAS API found. -- A library with LAPACK API found. -- Configuring done -- Generating done -- Build files have been written to: /tmp/build Scanning dependencies of target fastphylo [ 1%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/BitVector.cpp.o [ 2%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/Exception.cpp.o [ 3%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/InitAndPrintOn_utils.cpp.o [ 4%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/Object.cpp.o [ 5%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/Sequence.cpp.o [ 7%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/SequenceTree.cpp.o [ 8%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/SequenceTree_MostParsimonious.cpp.o [ 9%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/Simulator.cpp.o [ 10%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/arg_utils_ext.cpp.o [ 11%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/file_utils.cpp.o [ 13%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/stl_utils.cpp.o [ 14%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/DNA_b128/DNA_b128_String.cpp.o [ 15%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/DNA_b128/Sequences2DistanceMatrix.cpp.o [ 16%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/distance_methods/LeastSquaresFit.cpp.o [ 17%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/distance_methods/NeighborJoining.cpp.o [ 19%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/sequence_likelihood/Kimura2parameter.cpp.o [ 20%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/sequence_likelihood/TamuraNei.cpp.o [ 21%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/sequence_likelihood/ambiguity_nucleotide.cpp.o [ 22%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/sequence_likelihood/dna_pairwise_sequence_likelihood.cpp.o [ 23%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/sequence_likelihood/string_compare.cpp.o [ 25%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/DistanceMatrix.cpp.o [ 26%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/FloatDistanceMatrix.cpp.o [ 27%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/DistanceRow.cpp.o [ 28%] Building C object src/c++/CMakeFiles/fastphylo.dir/arg_utils.c.o cc1: warning: command line option "-fno-default-inline" is valid for C++/ObjC++ but not for C [ 29%] Building C object src/c++/CMakeFiles/fastphylo.dir/std_c_utils.c.o cc1: warning: command line option "-fno-default-inline" is valid for C++/ObjC++ but not for C [ 30%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/xml_output_global.cpp.o [ 32%] Building C object src/c++/CMakeFiles/fastphylo.dir/DNA_b128/sse2_wrapper.c.o [ 33%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/DNA_b128/computeTAMURANEIDistance_DNA_b128_String.cpp.o [ 34%] Building CXX object src/c++/CMakeFiles/fastphylo.dir/DNA_b128/computeDistance_DNA_b128_String.cpp.o Linking CXX static library libfastphylo.a [ 34%] Built target fastphylo [ 35%] Generating programs/fastdist/gengetopt/fastdist_gengetopt.c, programs/fastdist/gengetopt/fastdist_gengetopt.h Scanning dependencies of target fastdist [ 36%] Building CXX object src/c++/CMakeFiles/fastdist.dir/programs/fastdist/main.cpp.o [ 38%] Building CXX object src/c++/CMakeFiles/fastdist.dir/programs/fastdist/PhylipMaInputStream.cpp.o [ 39%] Building CXX object src/c++/CMakeFiles/fastdist.dir/programs/fastdist/FastaInputStream.cpp.o [ 40%] Building CXX object src/c++/CMakeFiles/fastdist.dir/programs/fastdist/DataOutputStream.cpp.o [ 41%] Building CXX object src/c++/CMakeFiles/fastdist.dir/programs/fastdist/XmlOutputStream.cpp.o [ 42%] Building CXX object src/c++/CMakeFiles/fastdist.dir/programs/fastdist/PhylipDmOutputStream.cpp.o [ 44%] Building CXX object src/c++/CMakeFiles/fastdist.dir/programs/fastdist/BinaryDmOutputStream.cpp.o [ 45%] Building C object src/c++/CMakeFiles/fastdist.dir/programs/fastdist/gengetopt/fastdist_gengetopt.c.o cc1: warning: command line option "-fno-default-inline" is valid for C++/ObjC++ but not for C [ 46%] Building CXX object src/c++/CMakeFiles/fastdist.dir/programs/fastdist/XmlInputStream.cpp.o Linking CXX executable fastdist [ 48%] Built target fastdist [ 50%] Generating programs/fastprot/gengetopt/fastprot_gengetopt.c, programs/fastprot/gengetopt/fastprot_gengetopt.h Scanning dependencies of target fastprot [ 51%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/main.cpp.o [ 52%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/FastaInputStream.cpp.o [ 53%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/DataOutputStream.cpp.o [ 54%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/XmlOutputStream.cpp.o [ 55%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/PhylipMaInputStream.cpp.o [ 57%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/ProtDistCalc.cpp.o [ 58%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/ModelMatrix.cpp.o [ 59%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/ExpectedDistance.cpp.o [ 60%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/Matrix.cpp.o [ 61%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/MaximumLikelihood.cpp.o [ 63%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/ProtSeqUtils.cpp.o [ 64%] Building C object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/gengetopt/fastprot_gengetopt.c.o cc1: warning: command line option "-fno-default-inline" is valid for C++/ObjC++ but not for C [ 65%] Building CXX object src/c++/CMakeFiles/fastprot.dir/programs/fastprot/XmlInputStream.cpp.o Linking CXX executable fastprot [ 67%] Built target fastprot [ 69%] Generating programs/fastprot_mpi/gengetopt/fastprot_mpi_gengetopt.c, programs/fastprot_mpi/gengetopt/fastprot_mpi_gengetopt.h Scanning dependencies of target fastprot_mpi [ 70%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/main.cpp.o [ 71%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/FastaInputStream.cpp.o [ 72%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/DataOutputStream.cpp.o [ 73%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/XmlOutputStream.cpp.o [ 75%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/PhylipMaInputStream.cpp.o [ 76%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/ProtDistCalc.cpp.o [ 77%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/ModelMatrix.cpp.o [ 78%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/ExpectedDistance.cpp.o [ 79%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/Matrix.cpp.o [ 80%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/MaximumLikelihood.cpp.o [ 82%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/ProtSeqUtils.cpp.o [ 83%] Building C object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/gengetopt/fastprot_mpi_gengetopt.c.o cc1: warning: command line option "-fno-default-inline" is valid for C++/ObjC++ but not for C [ 84%] Building CXX object src/c++/CMakeFiles/fastprot_mpi.dir/programs/fastprot_mpi/XmlInputStream.cpp.o Linking CXX executable fastprot_mpi [ 86%] Built target fastprot_mpi [ 88%] Generating programs/fnj/gengetopt/fnj_gengetopt.c, programs/fnj/gengetopt/fnj_gengetopt.h Scanning dependencies of target fnj [ 89%] Building CXX object src/c++/CMakeFiles/fnj.dir/programs/fnj/main.cpp.o [ 90%] Building CXX object src/c++/CMakeFiles/fnj.dir/programs/fnj/DataInputStream.cpp.o [ 91%] Building CXX object src/c++/CMakeFiles/fnj.dir/programs/fnj/DataOutputStream.cpp.o [ 92%] Building CXX object src/c++/CMakeFiles/fnj.dir/programs/fnj/XmlOutputStream.cpp.o [ 94%] Building CXX object src/c++/CMakeFiles/fnj.dir/programs/fnj/PhylipDmInputStream.cpp.o [ 95%] Building CXX object src/c++/CMakeFiles/fnj.dir/programs/fnj/BinaryInputStream.cpp.o [ 96%] Building C object src/c++/CMakeFiles/fnj.dir/programs/fnj/gengetopt/fnj_gengetopt.c.o cc1: warning: command line option "-fno-default-inline" is valid for C++/ObjC++ but not for C [ 97%] Building CXX object src/c++/CMakeFiles/fnj.dir/programs/fnj/XmlInputStream.cpp.o Linking CXX executable fnj [100%] Built target fnj [ 34%] Built target fastphylo [ 48%] Built target fastdist [ 67%] Built target fastprot [ 86%] Built target fastprot_mpi [100%] Built target fnj Install the project... -- Install configuration: "" -- Installing: /tmp/bin/fastdist -- Removed runtime path from "/tmp/bin/fastdist" -- Installing: /tmp/bin/fnj -- Removed runtime path from "/tmp/bin/fnj" -- Installing: /tmp/bin/fastprot -- Removed runtime path from "/tmp/bin/fastprot" -- Installing: /tmp/bin/fastprot_mpi -- Removed runtime path from "/tmp/bin/fastprot_mpi"
If you want to build the html documentation ( i.e. this page ) you need to pass the -DBUILD_DOCBOOK=ON option to cmake.
This is section is mainly intended for package maintainers
On a CentOS or Fedora machine, first log in as root and install the dependencies
# yum install xmlto libxml2-devel cmake gcc-c++ binutils gengetopt
Check that cmake is version 2.6 or later
$ cmake --version cmake version 2.6-patch 0
If it is older you could download a cmake binary directly from www.cmake.org
$ mkdir /tmp/build $ cd /tmp/build $ cmake -DCMAKE_INSTALL_PREFIX=/ -DBUILD_DOCBOOK=ON /tmp/source && make package
On a Debian or Ubuntu machine, first log in as root and install the dependencies
# apt-get install libxml2-dev cmake g++ binutils gengetopt
Check that cmake is version 2.6 or later
$ cmake --version cmake version 2.6-patch 0
If it is older you could download a cmake binary directly from www.cmake.org. Now build the deb package.
$ mkdir /tmp/build $ cd /tmp/build $ cmake -DCMAKE_INSTALL_PREFIX=/ -DBUILD_DOCBOOK=ON /tmp/source && make package
To build the fastphylo install package for MacOS X you need to have installed all the dependancies mentioned in section Section 3.2.2.1, “Building from source on Unix” on your MacOS X computer.
Check that cmake is version 2.6 or later
$ cmake --version cmake version 2.6-patch 0
$ mkdir /tmp/build $ cd /tmp/build $ cmake -DSTATIC=ON -DCPACK_GENERATOR="TGZ" /tmp/source && make package
fastdist implements the algorithm Fast Computation of Distance Estimators ( see Section 2.1, “Fast Computation of Distance Estimators” )
Type fastdist --help to see the command line options
[user@saturn ~]$ fastdist --help
fastdist 1.0.1
Usage: fastdist [OPTIONS]... [FILE]...
Computes distance matrices out of multialignments
  -h, --help                    Print help and exit
  -V, --version                 Print version and exit
If FILE is not specified the input is read from stdin 
  -o, --outfile=filename        output filename. If not specifed, output is
                                  written to stdout
  -I, --input-format=ENUM       input format. xml means the Fastphylo sequence
                                  XML format  (possible values="fasta",
                                  "phylip", "xml" default=`fasta')
  -e, --memory-efficient         memory efficient. Use less memory space and
                                  fast implementation. Only used with fasta and
                                  phylip format  (default=off)
  -O, --output-format=ENUM      output format. xml means the Fastphylo distance
                                  matrix XML format  (possible
                                  values="phylip", "xml", "binary"
                                  default=`xml')
  -D, --distance-function=ENUM  Distance function  (possible values="JC",
                                  "K2P", "TN93", "HAMMING" default=`K2P')
  -b, --bootstraps=INT          Bootstrap num times and create matrix for each
                                  (default=`0')
  -k, --no-incl-orig            If the distance matrix from the original
                                  sequences should not be included
                                  (default=off)
  -s, --seed=INT                Random seed. If not specified the current
                                  timestamp will be used
  -A, --no-ambiguities          Ignore ambiguities  (default=off)
  -R, --no-ambig-resolve        Specifies that ambigious symbols should not be
                                  resolved by nearest neighbor  (default=off)
  -t, --no-transprob            Specifies that the transition probabilities
                                  should not be used in the ambiguity model
                                  (default=off)
  -a, --ambiguity-frequency-model=ENUM
                                Ambiguity frequency model  (possible
                                  values="UNI", "BASE" default=`UNI')
  -T, --tstvratio=FLOAT         Transition/transvertion ratio for purine
                                  transitions ( for the TN model )
                                  (default=`2.0')
  -P, --pyrtvratio=FLOAT        Transition/transvertion ratio for  pyrimidines
                                  transitions ( for the TN model )
                                  (default=`2.0')
  -N, --no-tstvratio            If given fixed ts/tv ratios will not be used
                                  (default=off)
  -F, --fixfactor=FLOAT         Float specifying what factor to use for
                                  saturated data. If not given -1 in the entry.
                                  (default=`1')
  -r, --number-of-runs=INT      nr of runs ( datasets ) in input. This option
                                  is only used if the input format is
                                  phylip_multialignment.  (default=`1')
  -p, --print-relaxng-input     print the Relax NG schema for the XML input
                                  format ( Fastphylo sequence XML format ) and
                                  then exit  (default=off)
  -w, --print-relaxng-output    print the Relax NG schema for the XML output
                                  format ( Fastphylo distance matrix XML format
                                  ) and then exit.  (default=off)
Example usage of this program can be found at its home page
http://fastphylo.sourceforge.net/
Table 1. fastdist input file formats
| file format | short option | description | 
|---|---|---|
| fasta format | -I fasta | Section 3.4.3, “Fasta format” | 
| phylip format | -I phylip | Section 3.4.2, “phylip format” | 
| fastphylo sequence XML format | -I xml | Section 3.4.1, “Fastphylo sequence XML format” | 
Table 2. fastdist output file formats
| file format | short option | description | 
|---|---|---|
| fastphylo sequence XML format | -O xml | Section 3.4.4, “Fastphylo distance matrix XML format” | 
| Binary distance matrix format | -O binary | Section 3.4.6, “Binary distance matrix format” | 
| phylip distance matrix format | -O phylip | Section 3.4.5, “Phylip distance matrix format” | 
Example 1. fastdist with input in file phylip format
We use the DNA file described in Example 13, “Example files in phylip format” as input file. 
The file has two datasets so we pass the option -r 2 to fastdist. Per default the output is given in XML format 
[user@saturn ~]$ fastdist -I phylip seq.phylip
<?xml version="1.0"?>
<root>
 <runs>
  <run id="" dim="3">
   <identities>
    <identity name="Alpha"/>
    <identity name="Beta"/>
    <identity name="Gamma"/>
   </identities>
   <dms>
   <dm>
    <row>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.299650</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.733169</entry>
     <entry>0.309520</entry>
     <entry>0.000000</entry>
    </row>
   </dm>
   </dms>
  </run>
 </runs>
</root>
Example 2. fastdist with input in file fasta format
We use the file described in Example 14, “seq.fasta, an example file in fasta format” as input file. Per default the output is given in XML format
[user@saturn ~]$ fastdist -I fasta seq.fasta
<?xml version="1.0"?>
<root>
 <runs>
  <run id="" dim="3">
   <identities>
    <identity name="Alpha"/>
    <identity name="Beta"/>
    <identity name="Gamma"/>
   </identities>
   <dms>
   <dm>
    <row>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.299650</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.733169</entry>
     <entry>0.309520</entry>
     <entry>0.000000</entry>
    </row>
   </dm>
   </dms>
  </run>
 </runs>
</root>
Example 3. fastdist with input file in XML format
We use the file described in Example 12, “Example files in Fastphylo sequence XML format” containing DNA sequences as input file.
| ![[Note]](../style/images/note.png) | Note | 
|---|---|
| The -r option can only be used if the input is in phylip format. fastdist will for XML files compute all data sets ( runs ). Fasta files can only contain one data set so the -r option does not make any sense there. | 
[user@saturn ~]$ fastdist -I xml -O xml seq.xml
<?xml version="1.0"?>
<root>
 <runs>
  <run id="run1" dim="3">
   <identities>
    <identity name="Alpha"/>
    <identity name="Beta">
     <extrainfo myattr="" species="penguin">
          <foo bar="1"/>
        </extrainfo>
    </identity>
    <identity name="Gamma"/>
   </identities>
   <dms>
   <dm>
    <row>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.299650</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.733169</entry>
     <entry>0.309520</entry>
     <entry>0.000000</entry>
    </row>
   </dm>
   </dms>
  </run>
  <run id="run2" dim="3">
   <identities>
    <identity name="Alpha"/>
    <identity name="Beta"/>
    <identity name="Gamma"/>
   </identities>
   <dms>
   <dm>
    <row>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.299650</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.733169</entry>
     <entry>0.309520</entry>
     <entry>0.000000</entry>
    </row>
   </dm>
   </dms>
  </run>
 </runs>
</root>
Example 4. fastdist with an XML stream on stdin
If you leave out the input filename, the input will be read from stdin. fastdist doesn't wait for the whole xml file to be read before it starts. It starts a computation as soon as an ending </run> has been read. The memory consumption will not grow over time so the input can be arbitrarily large. A never ending input stream only works in the fastphylo sequence XML format, because the phylip input format needs you to specify in advance how many data sets are to be sent to fastdist ( the -r option ).
[user@saturn ~]$ cat seq.xml | fastdist -I xml -O xml
<?xml version="1.0"?>
<root>
 <runs>
  <run id="run1" dim="3">
   <identities>
    <identity name="Alpha"/>
    <identity name="Beta">
     <extrainfo myattr="" species="penguin">
          <foo bar="1"/>
        </extrainfo>
    </identity>
    <identity name="Gamma"/>
   </identities>
   <dms>
   <dm>
    <row>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.299650</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.733169</entry>
     <entry>0.309520</entry>
     <entry>0.000000</entry>
    </row>
   </dm>
   </dms>
  </run>
  <run id="run2" dim="3">
   <identities>
    <identity name="Alpha"/>
    <identity name="Beta"/>
    <identity name="Gamma"/>
   </identities>
   <dms>
   <dm>
    <row>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.299650</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.733169</entry>
     <entry>0.309520</entry>
     <entry>0.000000</entry>
    </row>
   </dm>
   </dms>
  </run>
 </runs>
</root>
Example 5. reading the fastdist XML output stream with python
If the XML output is very large you might want to use an XML parser that doesn't hold the whole file in memory. This python script is an example of how to do this
#!/usr/bin/python
import sys
from lxml import etree
from copy import deepcopy
for action, element in etree.iterparse(sys.stdin, tag="dm"):
  dm_copy=deepcopy(element)
  print dm_copy.xpath('count(row/entry[ number(.) < 0.1 ])')
For each distance matrix the script counts the number of elements with a value below 0.1
[user@saturn ~]$ cat seq.xml | fastdist -I xml -O xml | python fastdist_lxml.py 3.0 3.0
      
fastprot estimates the evolutionary distance between aligned protein sequences. It implements two methods for calculating the distance between protein sequences, the maximum likelihood of a distance and the expected distance (see further paper by Agarwal and States).
Type fastprot --help to see the command line options
[user@saturn ~]$ fastprot --help
fastprot 1.0.1
Usage: fastprot [OPTIONS]... [FILE]...
Computes distance matrices out of multialignments of protein sequences
  -h, --help                    Print help and exit
      --detailed-help           Print help, including all details and hidden
                                  options, and exit
  -V, --version                 Print version and exit
If FILE is not specified the input is read from stdin 
  -o, --outfile=filename        output filename. If not specified, output is
                                  written to stdout
  -I, --input-format=ENUM       input format. xml means the Fastphylo sequence
                                  XML format  (possible values="fasta",
                                  "phylip", "xml" default=`fasta')
  -e, --memory-efficient         memory efficient. Use less memory space and
                                  fast implementation. Only used with fasta and
                                  phylip format  (default=off)
  -O, --output-format=ENUM      output format. xml means the Fastphylo distance
                                  matrix XML format  (possible
                                  values="phylip", "xml", "binary"
                                  default=`xml')
  -b, --bootstraps=INT          Bootstrap num times and create matrix for each
                                  (default=`0')
  -k, --no-incl-orig            If the distance matrix from the original
                                  sequences should NOT be included - for
                                  bootstrapping  (default=off)
  -R, --seed=INT                Random seed. If not specified the current
                                  timestamp will be used
  -D, --distance-function=ENUM  Distance function  (possible values="ID",
                                  "JC", "JCK", "JCSS", "WAG", "JTT",
                                  "DAY", "ARVE", "MVR", "LG"
                                  default=`WAG')
  -F, --model-file=filename     Read matrix and equilibrium distribution from
                                  file, when used --distance-function is
                                  disregarded
  -i, --remove-indels           Remove gap columns. A gap is denoted by '-'.
                                  (default=off)
  -m, --maximum-likelihood      Compute a Maximum Likelihood estimate instead.
                                  Can not be used with --distance-function=ID,
                                  JC, JCK or JCSS or --sd  (default=off)
  -S, --sd                      Not yet implemented! Output a matrix with
                                  standard deviations after the distance
                                  matrix. Can not be used with
                                  --distance-function=ID, JC, JCK or JCSS or
                                  --maximum-likelihood  (default=off)
  -p, --pfam                    use a normal distribution as distance prior,
                                  estimated from Pfam 7.2  (default=off)
  -s, --speed=INT               'Speed'. High speed results in low precision,
                                  only affects ED calculations. Default is 5.
                                  Valid range is [1,10].  (possible
                                  values="1", "2", "3", "4", "5",
                                  "6", "7", "8" default=`4')
  -P, --print-relaxng-input     print the Relax NG schema for the XML input
                                  format ( Fastphylo protein sequence XML
                                  format ) and then exit  (default=off)
  -w, --print-relaxng-output    print the Relax NG schema for the XML output
                                  format ( Fastphylo distance matrix XML format
                                  ) and then exit.  (default=off)
Example usage of this program can be found at its home page
http://fastphylo.sourceforge.net/
Table 3. fastprot input file formats
| file format | short option | description | 
|---|---|---|
| fasta format | -I fasta | Section 3.4.3, “Fasta format” | 
| phylip format | -I phylip | Section 3.4.2, “phylip format” | 
| fastphylo sequence XML format | -I xml | Section 3.4.1, “Fastphylo sequence XML format” | 
Table 4. fastprot output file formats
| file format | short option | description | 
|---|---|---|
| fastphylo sequence XML format | -O xml | Section 3.4.4, “Fastphylo distance matrix XML format” | 
| Binary distance matrix format | -O binary | Section 3.4.6, “Binary distance matrix format” | 
| phylip distance matrix format | -O phylip | Section 3.4.5, “Phylip distance matrix format” | 
Example 6. fastprot with input in file phylip format
We use protein sequence file described in Example 13, “Example files in phylip format” as input file.
[user@saturn ~]$ 
fastprot -I phylip protein_seq.phylip -O xml
<?xml version="1.0"?>
<root>
 <runs>
  <run id="" dim="4">
   <identities>
    <identity name="Cow"/>
    <identity name="Carp"/>
    <identity name="Chicken"/>
    <identity name="Human"/>
   </identities>
   <dms>
   <dm>
    <row>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.402252</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>2.622102</entry>
     <entry>2.334973</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>2.919533</entry>
     <entry>2.733489</entry>
     <entry>0.903515</entry>
     <entry>0.000000</entry>
    </row>
   </dm>
   </dms>
  </run>
 </runs>
</root>
Example 7. fastprot with input in file fasta format
We use the file described in Example 15, “protein_seq.fasta, an example file in fasta format” as input file. Per default the output is given in XML format
[user@saturn ~]$ fastprot -I fasta protein_seq.fasta
<?xml version="1.0"?>
<root>
 <runs>
  <run id="" dim="4">
   <identities>
    <identity name="Cow"/>
    <identity name="Carp"/>
    <identity name="Chicken"/>
    <identity name="Human"/>
   </identities>
   <dms>
   <dm>
    <row>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.402252</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>2.622102</entry>
     <entry>2.334973</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>2.919533</entry>
     <entry>2.733489</entry>
     <entry>0.903515</entry>
     <entry>0.000000</entry>
    </row>
   </dm>
   </dms>
  </run>
 </runs>
</root>
      
fnj implements the algorithm Fast Neighbor Joining ( see Section 2.2, “Fast Neighbor Joining” )
Type fnj --help to see the command line options
[user@saturn ~]$ fnj --help
fnj 1.0.1
Usage: fnj [OPTIONS]... [FILE]...
builds phylogenetic trees
  -h, --help                    Print help and exit
  -V, --version                 Print version and exit
  -o, --outfile=filename        output filename. If not specifed, output is
                                  written to stdout
  -I, --input-format=ENUM       input format. 'xml' means the 'Fastphylo
                                  distance matrix XML format'  (possible
                                  values="phylip", "xml", "binary"
                                  default=`xml')
  -O, --output-format=ENUM      output format. 'xml' means the 'Fastphylo tree
                                  count XML format'  (possible
                                  values="newick", "xml" default=`xml')
  -c, --print-counts            print the tree count before each the newick
                                  tree. This flag has no effect on the XML
                                  output format.  (default=off)
  -a, --analyze-run-number=INT  Determines which dataset should be analyzed
                                  with 1 being the first dataset. By default
                                  all are analyzed
  -m, --method=ENUM             reconstruction method to apply  (possible
                                  values="NJ", "FNJ", "BIONJ"
                                  default=`FNJ')
  -d, --dm-per-run=INT          nr of Distance matrices per run. Is only used
                                  if the input format is phylip  (default=`1')
  -r, --number-of-runs=INT      nr of runs. Is only used if the input format is
                                  phylip  (default=`1')
  -b, --bootstraps=INT          number of boot straps  (default=`0')
  -p, --print-relaxng-input     print the Relax NG schema for the XML input
                                  format ( Fastphylo distance matrix XML format
                                  ) and then exit  (default=off)
  -w, --print-relaxng-output    print the Relax NG schema for the XML output
                                  format ( Fastphylo tree count XML format )
                                  and then exit.  (default=off)
Example usage of this program can be found at its home page
http://fastphylo.sourceforge.net/
Table 5. fnj input file formats
| file format | short option | description | 
|---|---|---|
| fastphylo sequence XML format | -I xml | Section 3.4.4, “Fastphylo distance matrix XML format” | 
| Binary distance matrix format | -I binary | Section 3.4.6, “Binary distance matrix format” | 
| phylip distance matrix format | -I phylip | Section 3.4.5, “Phylip distance matrix format” | 
Table 6. fnj output file formats
| file format | short option | description | 
|---|---|---|
| fastphylo count tree XML format | -O xml | Section 3.4.7, “Fastphylo tree count XML format” | 
Example 8. fnj with input file in Phylip distance matrix format
We use the file described in Example 17, “dm.phylip, an example file in phylip distance matrix format” as input file. The file has two datasets so we pass the option -r 2 to fnj. Per default the output is given in the "fastphylo count tree XML format" ( -O xml ).
[user@saturn ~]$ fnj -r 2 -I phylip dm.phylip
<?xml version="1.0"?>
 <root>
  <runs>
   <run id="" dim="3">
    <identities>
     <identity name="Alpha"/>
     <identity name="Beta"/>
     <identity name="Gamma"/>
    </identities>
    <tree>
     <count>2</count>
     <newick-xml><branch><leaf>Gamma</leaf><leaf>Beta</leaf><leaf>Alpha</leaf></branch></newick-xml>
     <newick>(Gamma,Beta,Alpha);</newick>
    </tree>
   </run>
   <run id="" dim="3">
    <identities>
     <identity name="Alpha"/>
     <identity name="Beta"/>
     <identity name="Gamma"/>
    </identities>
   </run>
  </runs>
 </root>
Example 9. fnj with input file in XML format
We use the file described in Example 16, “dm.xml, an example file in Fastphylo distance matrix XML format” as input file. Per default the output is given in the "fastphylo count tree XML format" ( -O xml ).
| ![[Note]](../style/images/note.png) | Note | 
|---|---|
| The -r option is not available and also not needed when the input is in XML format. fnj computes all data sets ( runs ). | 
[user@saturn ~]$ fnj -I xml dm.xml
<?xml version="1.0"?>
 <root>
  <runs>
   <run id="a" dim="3">
    <identities>
     <identity name="Alpha"/>
     <identity name="Beta"/>
     <identity name="Gamma"/>
    </identities>
    <tree>
     <count>1</count>
     <newick-xml><branch><leaf>Gamma</leaf><leaf>Beta</leaf><leaf>Alpha</leaf></branch></newick-xml>
     <newick>(Gamma,Beta,Alpha);</newick>
    </tree>
   </run>
   <run id="b" dim="3">
    <identities>
     <identity name="Alpha"/>
     <identity name="Beta"/>
     <identity name="Gamma"/>
    </identities>
    <tree>
     <count>1</count>
     <newick-xml><branch><leaf>Gamma</leaf><leaf>Beta</leaf><leaf>Alpha</leaf></branch></newick-xml>
     <newick>(Gamma,Beta,Alpha);</newick>
    </tree>
   </run>
  </runs>
 </root>
Example 10. connecting fastdist to fnj with a pipe
We use the DNA file described in Example 13, “Example files in phylip format” as input file. The file has two data sets. We will bootstrap 3 times. First we send the data in phylip format through the pipe:
[user@saturn ~]$ cat seq.phylip | fastdist -I phylip -O phylip -b 3 -r 2 | fnj -I phylip -O xml -r 2 -d 4
<?xml version="1.0"?>
 <root>
  <runs>
   <run id="" dim="3">
    <identities>
     <identity name="Alpha"/>
     <identity name="Beta"/>
     <identity name="Gamma"/>
    </identities>
    <tree>
     <count>8</count>
     <newick-xml><branch><leaf>Gamma</leaf><leaf>Beta</leaf><leaf>Alpha</leaf></branch></newick-xml>
     <newick>(Gamma,Beta,Alpha);</newick>
    </tree>
   </run>
   <run id="" dim="3">
    <identities>
     <identity name="Alpha"/>
     <identity name="Beta"/>
     <identity name="Gamma"/>
    </identities>
   </run>
  </runs>
 </root>
We could also send the data in XML format through the pipe:
[user@saturn ~]$ cat seq.phylip | fastdist -I phylip  -O xml -b 3 -r 2 | fnj -I xml -O xml -m FNJ
<?xml version="1.0"?>
 <root>
  <runs>
   <run id="" dim="3">
    <identities>
     <identity name="Alpha"/>
     <identity name="Beta"/>
     <identity name="Gamma"/>
    </identities>
    <tree>
     <count>4</count>
     <newick-xml><branch><leaf>Gamma</leaf><leaf>Beta</leaf><leaf>Alpha</leaf></branch></newick-xml>
     <newick>(Gamma,Beta,Alpha);</newick>
    </tree>
   </run>
   <run id="" dim="3">
    <identities>
     <identity name="Alpha"/>
     <identity name="Beta"/>
     <identity name="Gamma"/>
    </identities>
    <tree>
     <count>4</count>
     <newick-xml><branch><leaf>Gamma</leaf><leaf>Beta</leaf><leaf>Alpha</leaf></branch></newick-xml>
     <newick>(Gamma,Beta,Alpha);</newick>
    </tree>
   </run>
  </runs>
 </root>
As the the XML format is more descriptive, the flags -d and -r are no longer needed by fnj.
Example 11. reading the fnj XML output stream with python
If the XML output is very large you might want to use an XML parser that doesn't hold the whole file in memory. This python script is an example of how to do this
#!/usr/bin/python
import sys
from lxml import etree
from copy import deepcopy
maxcount=0
for action, element in etree.iterparse(sys.stdin, tag="run"):
  run_copy=deepcopy(element)
  count=int(run_copy.xpath('tree/count')[0].text)
  if ( count > maxcount ):
    maxcount=count
print maxcount
The script prints the maximum count ( just as an example ).
[user@saturn ~]$ fnj -I xml dm.xml | python fnj_lxml.py 1
      
This software package handles the following file formats
The Fastphylo sequence XML format is chosen by the option -I xml to fastdist, fastprot or fastprot_mpi. 
For instance, type fastdist --print-relaxng-input to see its relaxng schema
[user@saturn ~]$ fastdist --print-relaxng-input
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <element name="root">
      <element name="runs">
        <zeroOrMore>
          <element name="run">
            <attribute name="id">
              <text/>
            </attribute>
            <oneOrMore>
              <element name="seq">
                <attribute name="seq">
                  <data type="string">
                    <param name="pattern">[acgtumrwsykvhdbnxACGTUMRWSYKVHDBNX -.?]+</param>
                  </data>
                </attribute>
                <attribute name="name">
                  <text/>
                </attribute>
                <optional>
                  <element name="extrainfo">
                    <ref name="anyContent"/>
                  </element>
                </optional>
              </element>
            </oneOrMore>
          </element>
        </zeroOrMore>
      </element>
    </element>
  </start>
  <define name="anyContent">
    <mixed>
      <zeroOrMore>
        <choice>
          <attribute>
            <anyName/>
          </attribute>
          <ref name="anyElement"/>
        </choice>
      </zeroOrMore>
    </mixed>
  </define>
  <define name="anyElement">
    <element>
      <anyName/>
      <ref name="anyContent"/>
    </element>
  </define>
</grammar>
The Relax NG schema specifies that the extrainfo element is optional and can be inserted as a child to a seq element. The extrainfo element may contain any content and will be passed on to the output XML format.
Example 12. Example files in Fastphylo sequence XML format
The example file seq.xml contains DNA sequences:
<?xml version="1.0"?>
<root>
  <runs>
    <run id="run1">
      <seq name="Alpha" seq="AACGTGGCCACAT"/>
      <seq name="Beta" seq="AAGGTCGCCACAC">
        <extrainfo myattr="" species="penguin">
          <foo bar="1"/>
        </extrainfo>
      </seq>
      <seq name="Gamma" seq="CAGTTCGCCACAA"/>
    </run>
    <run id="run2">
      <seq name="Alpha" seq="AACGTGGCCACAT"/>
      <seq name="Beta" seq="AAGGTCGCCACAC"/>
      <seq name="Gamma" seq="CAGTTCGCCACAA"/>
    </run>
  </runs>
</root>
protein_seq.xml contains protein sequences:
<?xml version="1.0"?>
<root>
  <runs>
    <run id="run1">
      <seq name="Cow" seq="MAYPMQLGFQDA"/>
	  <seq name="Carp" seq="MAHPTQLGFKDA"/>
	  <seq name="Chicken" seq="MALLTLMLMEKL"/>
	  <seq name="Human" seq="MAHLFLTLTTKL"/>
	</run>  
  </runs>
</root>
The phylip input format is chosen by the option -I phylip to fastdist. 
Example 13. Example files in phylip format
The DNA example file seq.phylip contains two datasets:
   3   13
Alpha     AAC GTGG
Beta      AAG GTCG
Gamma     CAG TTCG
          CCAC AT
          CCAC AC
          CCAC AA
   3   13
Alpha     CCACGGG
Beta      AAGGTCG
Gamma     CAGTTCG
          CGACAT
          CCACAC
          CCGCAA
The example file protein_seq.phylip contains protein sequences:
4 12 Cow MAYPMQLGFQDA Carp MAHPTQLGFKDA Chicken MALLTLMLMEKL Human MAHLFLTLTTKL
The Fasta input format is chosen by the option -I fasta to fastdist. 
Fasta files can only contain one data set. Read more about the Fasta format on Wikipedia.
The parser will take the whole header line as the sequence identifier name, i.e. all characters after the greater-than character ( ">" ).
Example 14. seq.fasta, an example file in fasta format
The example files seq.fasta contains DNA:
>Alpha AAC-GTGGCCAC-AT >Beta AAG-GTCGCCAC-AC >Gamma CAG-TTCGCCAC-AA
The Fastphylo sequence XML format is chosen by the option -O xml to fastdist, fastprot, fastprot_mpi and the option -I xml to fnj. 
For instance type fastdist --print-relaxng-output to see its relaxng schema
[user@saturn ~]$ fastdist --print-relaxng-output
<?xml version="1.0"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <element name="root">
      <element name="runs">
        <zeroOrMore>
          <element name="run">
            <attribute name="dim">
              <data type="integer"/>
            </attribute>
            <attribute name="id">
              <text/>
            </attribute>
            <element name="identities">
              <oneOrMore>
                <element name="identity">
                  <attribute name="name">
                    <text/>
                  </attribute>
                  <optional>
                    <element name="extrainfo">
                      <ref name="anyContent"/>
                    </element>
                  </optional>
                </element>
              </oneOrMore>
            </element>
            <element name="dms">
              <oneOrMore>
                <element name="dm">
                  <oneOrMore>
                    <element name="row">
                      <oneOrMore>
                        <element name="entry">
                          <data type="float"/>
                        </element>
                      </oneOrMore>
                    </element>
                  </oneOrMore>
                </element>
              </oneOrMore>
            </element>
          </element>
        </zeroOrMore>
      </element>
    </element>
  </start>
  <define name="anyContent">
    <mixed>
      <zeroOrMore>
        <choice>
          <attribute>
            <anyName/>
          </attribute>
          <ref name="anyElement"/>
        </choice>
      </zeroOrMore>
    </mixed>
  </define>
  <define name="anyElement">
    <element>
      <anyName/>
      <ref name="anyContent"/>
    </element>
  </define>
</grammar>
The Relax NG schema specifies that the extrainfo element is optional and can be inserted as a child to a seq element. The extrainfo element may contain any content.
Example 16. dm.xml, an example file in Fastphylo distance matrix XML format
The example file dm.xml contains
<?xml version="1.0"?>
<root>
 <runs>
  <run id="a" dim="3">
   <identities>
    <identity name="Alpha"/>
    <identity name="Beta"/>
    <identity name="Gamma"/>
   </identities>
   <dms>
   <dm>
    <row>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.299650</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>0.733169</entry>
     <entry>0.309520</entry>
     <entry>0.000000</entry>
    </row>
   </dm>
   </dms>
  </run>
  <run id="b" dim="3">
   <identities>
    <identity name="Alpha"/>
    <identity name="Beta"/>
    <identity name="Gamma"/>
   </identities>
   <dms>
   <dm>
    <row>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>3.258005</entry>
     <entry>0.000000</entry>
    </row>
    <row>
     <entry>1.873653</entry>
     <entry>0.459840</entry>
     <entry>0.000000</entry>
    </row>
   </dm>
   </dms>
  </run>
 </runs>
</root>
The Phylip distance matrix format is chosen by the option -O phylip to fastdist or the option  -I phylip to fnj.
Example 17. dm.phylip, an example file in phylip distance matrix format
The example file dm.phylip contains
    3
Alpha       0.000000  0.299650  0.733169
Beta        0.299650  0.000000  0.309520
Gamma       0.733169  0.309520  0.000000
    3
Alpha       0.000000  3.258005  1.873653
Beta        3.258005  0.000000  0.459840
Gamma       1.873653  0.459840  0.000000
It contains two data sets.
The Binary distance matrix format is chosen by the option -O binary to fastdist, fastprot and fastprot_mpi or the option  -I binary to fnj.
Using the binary format option, fastphylo performs row-wise operations in computing the upper triangular distance matrix. Furthermore, the upper triangular distance matrix 
is then stored in a binary format instead of plain text. The main advantage of introducing binary format is that it reduces the 
disk space utilization and speedup the performance of fastphylo since only half of the matrix is computted instead of the whole distance matrix.
In the binnary format output file, we first store fastphylo's current version followed by the number of sequences, then accessions and 
finally, rows of the upper trianguler distance matrix. We use colon delimiter for binary format to delimit each component separately.
The Fastphylo tree count XML format is chosen by the option -O xml to fnj. 
You can see an example of the format in the example Example 9, “fnj with input file in XML format”.
Type fnj --print-relaxng-output to see the formats relaxng schema. 
[user@saturn ~]$ fnj --print-relaxng-output
<?xml version="1.0"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <element name="root">
      <element name="runs">
        <zeroOrMore>
          <element name="run">
            <attribute name="id">
              <text/>
            </attribute>
            <attribute name="dim">
              <data type="integer"/>
            </attribute>
            <element name="identities">
              <oneOrMore>
                <element name="identity">
                  <attribute name="name">
                    <text/>
                  </attribute>
                  <optional>
                    <element name="extrainfo">
                      <ref name="anyContent"/>
                    </element>
                  </optional>
                </element>
              </oneOrMore>
            </element>
            <element name="tree">
              <element name="count">
                <data type="integer"/>
              </element>
              <element name="newick-xml">
                <ref name="branch"/>
              </element>
              <element name="newick">
                <text/>
              </element>
            </element>
          </element>
        </zeroOrMore>
      </element>
    </element>
  </start>
  <define name="anyContent">
    <mixed>
      <zeroOrMore>
        <choice>
          <attribute>
            <anyName/>
          </attribute>
          <ref name="anyElement"/>
        </choice>
      </zeroOrMore>
    </mixed>
  </define>
  <define name="anyElement">
    <element>
      <anyName/>
      <ref name="anyContent"/>
    </element>
  </define>
  <define name="branch">
    <element name="branch">
      <optional>
        <attribute name="length">
          <data type="float"/>
        </attribute>
      </optional>
      <oneOrMore>
        <choice>
          <element name="leaf">
            <optional>
              <attribute name="length">
                <data type="float"/>
              </attribute>
            </optional>
            <text/>
          </element>
          <ref name="branch"/>
        </choice>
      </oneOrMore>
    </element>
  </define>
</grammar>