Calculate pairwise distances between sequencesΒΆ
Section author: Gavin Huttley
An example of how to calculate the pairwise distances for a set of sequences.
>>> from cogent import LoadSeqs
>>> from cogent.phylo import distance
Import a substitution model (or create your own)
>>> from cogent.evolve.models import HKY85
Load my alignment
>>> al = LoadSeqs("data/long_testseqs.fasta")
Create a pairwise distances object with your alignment and substitution model
>>> d = distance.EstimateDistances(al, submodel= HKY85())
Printing d
before execution shows its status.
>>> print d
=========================================================================
Seq1 \ Seq2 Human HowlerMon Mouse NineBande DogFaced
-------------------------------------------------------------------------
Human * Not Done Not Done Not Done Not Done
HowlerMon Not Done * Not Done Not Done Not Done
Mouse Not Done Not Done * Not Done Not Done
NineBande Not Done Not Done Not Done * Not Done
DogFaced Not Done Not Done Not Done Not Done *
-------------------------------------------------------------------------
Which in this case is to simply indicate nothing has been done.
>>> d.run()
>>> print d
=====================================================================
Seq1 \ Seq2 Human HowlerMon Mouse NineBande DogFaced
---------------------------------------------------------------------
Human * 0.0730 0.3363 0.1804 0.1972
HowlerMon 0.0730 * 0.3487 0.1865 0.2078
Mouse 0.3363 0.3487 * 0.3813 0.4022
NineBande 0.1804 0.1865 0.3813 * 0.2019
DogFaced 0.1972 0.2078 0.4022 0.2019 *
---------------------------------------------------------------------
Note that pairwise distances can be distributed for computation across multiple CPU’s. In this case, when statistics (like distances) are requested only the master CPU returns data.
We’ll write a phylip formatted distance matrix.
>>> d.writeToFile('dists_for_phylo.phylip', format="phylip")
We’ll also save the distances to file in Python’s pickle format.
>>> import cPickle
>>> f = open('dists_for_phylo.pickle', "w")
>>> cPickle.dump(d.getPairwiseDistances(), f)
>>> f.close()