Perform Principal Coordinates AnalysisΒΆ

Section author: Cathy Lozupone

An example of how to calculate the pairwise distances for a set of sequences.

>>> from cogent import LoadSeqs
>>> from cogent.phylo import distance
>>> from cogent.cluster.metric_scaling import PCoA

Import a substitution model (or create your own)

>>> from cogent.evolve.models import HKY85

Load the alignment.

>>> al = LoadSeqs("data/test.paml")

Create a pairwise distances object calculator for the alignment, providing a substitution model instance.

>>> d = distance.EstimateDistances(al, submodel= HKY85())
>>> d.run()

Now use this matrix to perform principal coordinates analysis.

>>> PCoA_result = PCoA(d.getPairwiseDistances())
>>> print PCoA_result
======================================================================================
        Type              Label  vec_num-0  vec_num-1  vec_num-2  vec_num-3  vec_num-4
--------------------------------------------------------------------------------------
Eigenvectors          NineBande      -0.02       0.01       0.04       0.01       0.00
Eigenvectors           DogFaced      -0.04      -0.06      -0.01       0.00       0.00
Eigenvectors          HowlerMon      -0.07       0.01       0.01      -0.02       0.00
Eigenvectors              Mouse       0.20       0.01      -0.01      -0.00       0.00
Eigenvectors              Human      -0.07       0.04      -0.03       0.01       0.00
 Eigenvalues        eigenvalues       0.05       0.01       0.00       0.00      -0.00
 Eigenvalues  var explained (%)      85.71       9.60       3.73       0.95      -0.00
--------------------------------------------------------------------------------------

We can save these results to a file in a delimited format (we’ll use tab here) that can be opened up in any data analysis program, like R or Excel. Here the principal coordinates can be plotted against each other for visualization.

>>> PCoA_result.writeToFile('PCoA_results.txt',sep='\t')
>>> import os
>>> os.remove('PCoA_results.txt')