Building a tree of life

Building a tree of life with PyCogent

This cookbook example runs through how to construct construct a tree of life from 16S rRNA sequences to test whether the three domains of life are visible as three separate clusters in a phylogenetic tree. This example covers compiling sequences, building a multiple sequence alignment, building a phylogenetic tree from that sequence alignment, and visualizing the tree.

Step 0. Set up your python environment

For this tutorial you’ll need cogent, muscle, and FastTree installed on your system.

Start an interactive python session by entering the following into a command terminal:

python

You should now see the python command prompt:

>>>

Step 1: Download sequences from NCBI

Here we’ll work with archaeal, bacteria, and eukaryotic sequences obtained from NCBI using the PyCogent EUtils wrappers. Run the following commands to obtain these sequences:

from cogent.db.ncbi import EUtils
from cogent.parse.fasta import MinimalFastaParser
e = EUtils()
arc16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND archaea[orgn]']))
bac16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND bacteria[orgn]']))
euk16s = list(MinimalFastaParser(e['"small subunit rRNA"[ti] AND eukarya[orgn]']))

You can check how many sequences you obtained for each query by running:

len(arc16s)
len(bac16s)
len(euk16s)

Note

In this example you’ll notice that you have relatively few sequences for each query. You’d obtain many more if you replaced the rRNA in the query with ribosomal RNA, but the runtime would also be significantly longer. For the purpose of these tutorial we’ll therefore stick with this command that returns fewer sequences.

Step 2: Load the sequences

We’ll begin by loading the sequences that have been downloaded, applying a filter to retain only those that we consider to be of good quality. Sequences fewer than 750 bases or sequences containing one or more N characters will be ignored (N characters typically represent ambiguous base calls during sequencing).

First, define a function to load and filter the sequences:

from cogent.parse.fasta import MinimalFastaParser

def load_and_filter_seqs(seqs, domain_label):
    result = []
    for seq_id, seq in seqs:
        if len(seq) > 750 and seq.count('N') < 1:
            result.append((domain_label + seq_id,seq))
    return result

Next, load and filter the three sequence sets:

arc16s_filtered = load_and_filter_seqs(arc16s,'A: ')
bac16s_filtered = load_and_filter_seqs(bac16s,'B: ')
euk16s_filtered = load_and_filter_seqs(euk16s,'E: ')

len(arc16s_filtered)
len(bac16s_filtered)
len(euk16s_filtered)

Step 3: Select a random subset of the sequences

Import shuffle from the random module to extract a random collection of sequences:

from random import shuffle
shuffle(arc16s_filtered)
shuffle(bac16s_filtered)
shuffle(euk16s_filtered)

Select some random sequences from each domain. Note that only a few sequences are chosen to facilitate a quick analysis:

combined16s = arc16s_filtered[:3] + bac16s_filtered[:10] + euk16s_filtered[:6]
len(combined16s)

Step 4: Load the sequences into a SequenceCollection object

Use LoadSeqs to load the unaligned sequences into a SequenceCollection object. In this step we’ll rename the sequences (by passing a label_to_name function) to only the accession number for the sequence. This facilitates visualization in downstream steps.

from cogent import LoadSeqs, DNA
seqs = LoadSeqs(data=combined16s,moltype=DNA,aligned=False,label_to_name=lambda x: '|'.join(x.split('|')[:2]))

You can explore some properties of this sequence collection. For example, you can count how many sequences are in the sequence collection object:

seqs.getNumSeqs()

Step 5: Align the sequences using muscle

Load an aligner function, and align the sequences. Here we’ll align with muscle via the muscle application controller. The sequences will be loaded into an Alignment object called aln.

from cogent.app.muscle import align_unaligned_seqs
aln = align_unaligned_seqs(seqs,DNA)

Step 6: Build a tree from the alignment using FastTree

Load a tree-building function, and build a tree from the alignment. Here we’ll use FastTree. The tree will be stored in a PhyloNode object called tree.

from cogent.app.fasttree import build_tree_from_alignment
tree = build_tree_from_alignment(aln,DNA)

Step 7: Visualize the tree

Load a drawing function to generate a prettier picture of the tree:

from cogent.draw.dendrogram import UnrootedDendrogram
dendrogram = UnrootedDendrogram(tree)

Have a quick look at the unrooted dendrogram:

dendrogram.showFigure()

You should see something like this:

../_images/tol_not_gap_filtered.png

Figure 1: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences.

Step 8: Save the tree as a PDF

Finally, you can save this tree as a PDF for sharing or later viewing:

dendrogram.drawToPDF('./tol.pdf')

You can also write the alignment and tree to fasta and newick files, respectively. You can then load these in tools such as BoulderALE (for alignment editing) or TopiaryExplorer or FigTree (for tree viewing, coloring, and layout manipulation).

open('./tol.fasta','w').write(aln.toFasta())
open('./tol.tre','w').write(tree.getNewick(with_distances=True))

Extra credit: Alignment filtering

Filter highly gapped positions from the alignment

To try to improve the quality of the alignment and therefore the tree, it’s often a good idea to removed positions that contain a high proportion of gap characters from the alignment. These generally represent non-homologous regions of the sequence of interest, and therefore contribute little to our understanding of the evolutionary history of the sequence. These steps may result in a clearer delineation of the three domains on your tree, but the results will in part be dependent on the randomly chosen sequences in your alignment.

To remove positions that are greater than 10% gap characters from the alignment, run the following command:

gap_filtered_aln = aln.omitGapPositions(allowed_gap_frac=0.10)

If you count the positions in both the full and reduced alignments you’ll see that your alignment is now a lot shorter:

len(aln)
len(gap_filtered_aln)

Rebuild the tree and visualize the result as before:

gap_filtered_tree = build_tree_from_alignment(gap_filtered_aln,DNA)
gap_filtered_dendrogram = UnrootedDendrogram(gap_filtered_tree)
gap_filtered_dendrogram.showFigure()

Your tree should look something like this:

../_images/tol_gap_filtered.png

Figure 2: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences.

Filtering highly variable positions

Another issue that adds noise to alignments of distantly related sequences is highly entropic (or highly variable) positions. To filter these, we can compute the Shannon Entropy or uncertainty of each position, and then remove the most 10% entropic positions.

First we’ll compile the Shannon Entropy value for each position in the alignment:

sorted_uncertainties = sorted(gap_filtered_aln.uncertainties())

Next we’ll find the 90th percentile by sorting the uncertainties and finding that value that is 90% of the way through that list:

uncertain_90p = sorted_uncertainties[int(len(sorted_uncertainties)*0.9)]

Next we’ll identify and store the positions that have lower entropy than uncertain_90p:

positions_to_keep = []
for i,u in enumerate(gap_filtered_aln.uncertainties()):
     if u < uncertain_90p:
         positions_to_keep.append(i)

Then we’ll filter the alignment to contain only those positions:

entropy_gap_filtered_aln = gap_filtered_aln.takePositions(positions_to_keep)

We can then rebuild and visualize the tree:

entropy_gap_filtered_tree = build_tree_from_alignment(entropy_gap_filtered_aln,DNA)
entropy_gap_filtered_dendrogram = UnrootedDendrogram(entropy_gap_filtered_tree)
entropy_gap_filtered_dendrogram.showFigure()

Your tree should look something like this:

../_images/tol_entropy_gap_filtered.png

Figure 3: A tree of life build from 16S rRNA sequences. A: archaeal sequence; B: bacterial sequences; E: eukaryotic sequences.

While the trees in Figures 1, 2, and 3 don’t look very different, an interesting point to note is the amount of information in each:

len(aln)
len(gap_filtered_aln)
len(entropy_gap_filtered_aln)

The entropy and gap filtered alignment (entropy_gap_filtered_aln) contains approximately 1/4 of the positions as the full alignment (aln), yet results in a nearly identical phylogenetic tree. This suggests that the filtered positions add very little phylogenetic information. In small alignments such as the example here this may not have a large affect on run time, but when building a tree from thousands or tens of thousands of sequences removing gap and high entropy positions can save significant compute time as well as frequently improving results.

Starting with Silva sequences (to skip steps of obtaining sequences from NCBI)

The following sequences are randomly chosen from the Silva database. You can use these instead of pulling random sequences from NCBI.

fasta_str = """>AF424517 1 994 Archaea/Crenarchaeota/uncultured/uncultured
CAGCAGCCGCGGTAATACCAGCCCCCCGAGTGGTGGGGATGTTTATTTGGCCTAAAACGTCCGTAGCCAGCTCGGTAAATCTCTCGTTAAATCCAGCGTCCTAAGCGTTGGGCTGCGAGGGAGACTGCCAAGCTAGAGGGTGGGAGAGGTCAGCGGTATTTCTGGGGTAGGGGCGAAATCCATTGATCCCAGGAGGACCACCAGTGGCGAAGGCTGCTGACTAGAACACGCCTGACGGTGAGGGACGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCCAGCTGTAAACGATGCAAACTCGGTGATGCCCTGGCTTGTGGCCAGTGCAGTGCCGCAGGGAAGCCGTTAAGTTTGCCGCCTGGGAAGTACGTACGCAAGTATGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGAAGCCTGCGGTTCAATTGGAGTCAACGCCAGAAATCTTACCCGAAGAGACAGCAGAATGAAGGTCAAGCTGGAGACTTTACCAGACAAGCTGAGAAGTGGTGCATGGCCGTCGCCAGCTCGTGCCGTGAGATGTCCTGTTAAGTCAGGTAACCAGCGAGATCCCTGCCTCTAGTTGCCACCATTACTCTCCGGAGTAGTGGGGCGAATTAGCGGGACCGCCGTAGTTAATACGGAGGAAGGAAGGGGCCACGGCAGGTCAGTATGCCCTGAAACTTTGGGGCCACACGCGGGCTGCAATGGTAACGACAATGGGTTCCGAAACCGAAAGGTGGAGGTAATCCTCAAACGTTACCACAGTTATGATTGAGGGCTGCAACTCGCCCTCATGAATATGGAATCCCTAGTAACTGCGTGTCATTATCGCGCGGTGAATACGTCCCTGCTCCTTGCACACACTGCCCGTCGAACCACCCGAATGAGGTTTGGGTGAGGAATGGTCGAATGTTGGCCGTTTCGAACCTGGGCTTCGTAAGGAGGGTTAAGTCGTAACAAGGTAACCGTA
>AF448158 1 1828 Eukarya/Metazoa/Magelona et rel.
TTGATCCTGCCAGTAGTCATATGCTTGACTCAAAGATTAAGCCATGCATGTGCAAGTACATGACTTTTTTACACACGGTGAGACCGCGAATGGCTCATTAGATCAGTCTTAGTTCCTTAGACGGAAAGTGCTACTTGGATAACTGTGGCAATTCTAGAGCTAATACGTGCACGCAAGCTCCGACCTACTGGGGAAGAGCGCAATTATTAGATCAAGACCAAACGAGTCGAAAGGCTCGAACGTCTGGTGACTCTGGATAACCTCGGGCTGACCGCACGGCCAAGAGCCGGCGGCGCATCTTTCAAGTGTCTGCCCTATCAACTTTCGATGGTATGCGATCTGCGTACCATGGTGCTTACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACCTCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCTGGCACAGGGAGGTAGTGACGAGCAATAGCGACTCGGGACTCTTTCGAGGCCTCGGGATCGGAATGAGTACAACGTAAACACTTTTGCAAGGAACAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGCTGTTGCAGTTAAAAAGCTCGTAGCTGAATCTCGGGTGCGGGCGGGCGGTCCGCCTTACAGCGTGCACTGCCCCGATCCTGATCCAACTGCCGGTATTATCTCGGGGTGCTCTTAGCTGAGTGTCTTGGGCTGGCCGGTGCTTTTACTTTGAAAAAATTAGAGTGCTCAAAGCAGGCTTCCACGCCTGAATACTATAGCATGGAATAATGGAATAAGACCTCGGTTCTATTCTGTTGGTCTCTGGAAACCAGAGGTAATGATTAAGAGGGACAGACGGGGGCATTCGTATTGCGGGGCGAGAGGTGAAATTCTTAGACCCTCGCAAGACGAACTACAGCGAAAGCATTTGCCAAGCATGTTTTCTTTAGTCAAGAACGAAAGTCAGAGGTTCGAAGACGATCAGATACCGTCCTAGTTCTGACCATAAACGATGCCGACTAGCGATGCGCGAGCGTTGGTATCTGACCTCGCGCGCAGCTCCCGGGAAACCAAAGTCTTTGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCCGGCCCGGACACTGCGAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGATAACGAACGAGACTCTAGCCTGCTAAATAGTTCGTCGACACGCGGTTGTGTCTGGCGAGGAAACTTCTTAGAGGGACAAATGGCATTTAGTCATACGAGATTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTTCGGGGCCGCACGCGCGCTACACTGAAGGAGACAGCGAGTGTCCTGACCTAGCCCGAAAGGGCCGGGCAATCTGCTGAACCTCTTTCGTGGTAGGGATTGGGGCTTGCAATTGTTCCCCATGAACCAGGAATTCCGAGTAAGCGCAGGTCACAAGCCTGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGATTGAGCGGTTCAGTGAGACCCTCGGACTTGCCCAGCAGGAGCCGGCGACGGCTCCGCGTGTGTGCGAGAAAGAATGTCGAACTGTATTGCTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGCTT
>AJ428075 1 1749 Eukarya/Viridiplantae/Streptophyta/Klebsormidiophyceae
TAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCTAAGTATAAATTACTCTAAATGGTAAAACTGCGAATGGCTCATTAAATCAGTTATAGTTTATTTGATGATTCCTGCTACTCGGATAACCGTAGTAATTATAGAGCTAATACGTGCGCAAACGCCCGACTTCGGAAGGGCCGTATTTATTAGATAAAAGACCAACTCGGGGTTCGCCCCGAAACTTTGGTGATTCATAATGTAATCTCGGACCGCACGGCCTCGCGCCGGCGGCAAATCAATCAAATATCTGCCCTATCAACTTTCGATGGCAGGATAGTCGCCTGCCATGGTTGTAACGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCATGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATCCTGATTCAGGGAGGTAGTGACAATAAATAACAATACCGGTCTCTTATGTGACTGGTAATTGGAATGAGCGGAACATAAATACCTTAACGAGGATCCATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTCGGGACGGAGACGTCGGTCCTCCCTCGTGGTCGATACTGACTCTCTTCCTTAATTGCCTCGAGCGCCGCCTAGTCTTCATTGCCTGGGCGCGCTACGCGGCGCCGTTACCTTGAATAAATTATGGTGTTCAAAGCAGGCTTATGCTCTGAGTACATTAGCATGGAATAACGCTATAGGACTCCGGTCCTATTACGTTGGTCTTCTGACCGGAGTAATGATTAATAGGGACAGTCGGGGGCATTCGTACTTCATCGTTAGAGGTGAAATTCTTGGATCGATGAAAGACGAACTTCTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTGGGGGCGCGAAGACGATTAGATACCGTCCTAGTCCCAACCGTAAACGATGCCGACCCCGAATTGGCGCACGTATGACTTGACGTCGCCAGCGCCCGAGGAGAAATCAGAGTCTTTGGGTTCCGGGGGGAGTATGGTCGCAAGTCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGTGTGGAGCGTGCGGCTTAATTTGACTCAACGCGGGGAATCTTACCAGGTCCAGACATAGCGACGATTGACAGACTGATAGCTCTTTCTTGATCATATGGGTAGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGTTAACGAACGAGACCTCAGCTTGCTAACTAGTTGCGCGAAGATTTTCTTCGCGCACACTTCTTAGAAGGACTTTGAGCGTTTAGCTCATGGAGGTTTGAGGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACAATGATGCATTCAGCGAGCGGAATCCCTGATCGGAAACGGTCGGGCAATCTTTGAATCTTTATCGTGATGGGGATAGACCCTTGCAATTATTGGTCTCGAACGAGGAATACCTAGTAAGCGCTCGTCATCAGCGTGCGCTGACTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATAGAATGCTTCGGTGAAGCACTCGGATCGCGCCGCCGSCGGCGAAACCTCCGGGGACGGCATGAGAAGTTTGTTAAACCATATCGTTTAGAGGAAGGAGAAGTCGTAACAAGG
>AJ850036 1 1961 Eukarya/Metazoa/Arthropoda/Polyphaga/Bagous et rel.
TTGTCTCAAAGATTAAGCCATGCATGTCTCAGTACAAGCCATATTAAGGTGAAACCGCGAAAGGCTCATTAAATCAGTTATGGTTCCTTAGATCGTACCCAGGTTACTTGGATAACTGTGGTAATTCTAGAGCTAATACATGCAAACAGAGCTCCGACTGGAAACGGAAGGAGTGCTTTTATTAGATCAAAGCCAAACGGTAACTTAATGTTGTCGTACAATAATATTGTTGACTCTGAATAACTTTATGCTGATCGCATGGTCTTGCACCGGCGACGCATCTTTCAAATGTCTGCCTTATCAACTGTCGATGGTAGGTTCTGCGCCTACCATGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCCGGCACGGGGAGGTAGTGACGAAAAATAACGATACGGGACTCATCCGAGGCCCCGTAATCGGAATGAGTACACTTTAAATCCTTTAACGAGGATCAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCGGTTAAAAAGCTCGTAGTCAAATTTGTGTCTCGTGCCGCTGGTTCATCGTTCGCGGTGTTAATTGGCGTGATACGAGACGTCCTGCCGGTGGGCTTTCAGATTTTTCCGTATTTCAGGACCATAACAATTGGTTTGTATCTGTGGCGTAATACTGCAGTGCAGGGCAATTGGTTAATGAACGGTTGGTTTTTGTGCTACCCAAACTTACAATCCTGTCGCGTTGCTCTTGATTGAGTGACGAGGTGGGCCGGCACGTTTACTTTGAACAAATTAGAGTGCTTAAAGCAGGCAAAATTTCGCCTGAATATTCTGTGCATGGAATAATGGAATAGGACCTCGGTTCTATTTCGTTGGTTTTCGGAACTCCGAGGTAATGATTAATAGGAACGGATGGGGGCATTCGTATTGCGACGTTAGAGGTGAAATTCTTGGATCGTCGCAAGACGAACAGAAGCGAAAGCATTTGCCAAAAACGCTTTCATTGATCAAGAACGAAAGTTAGAGGTTCGAAGGCGATCAGATACCGCCCTAGTTCTAACCGTAAACTATGTCATCTGACGATCCGTCGACGTTCCTTTATTGACTCGACGGGCAGTTTCCGGGAAACCAAAGATTTTGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAACCTCACCAGGCCCGGACACCGGAAGGATTGACAGATTGAGAGCTCTTTCTTGATTCGGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGCGATTTGTCTGGTTAATTCCGATAACGAACGAGACTCTAGCCTGCTAAATAGGCGACATATGACATCGCAAAGGCCAGCCGGTTTGATTTAAAGGGTGGCGAGGTGGCGTCAAGGCGTTTATCTCGTGCTCTTGTCAGATTGTGCGCGGTTTTTACTGTCGGCGTATAAATAATTCTTCTTAGAGGGACAGGCGGCTTTTAGCCGCACGAGATTGAGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGAAGGAATCAGCGTGTCCTCCCTGGCCGAGTGGCCCGGGTAACCCGCTGAACCTCCTTCGTGCTAGGGATTGGGGCTTGCAATTGTTCCCCATGAACGAGGAATTCCCAGTAAGCGCGAGTCATAAGCTCGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGATTGAATGATTTACTGAGGTCTTCGGATCGATGCGCGATGACGTCTGACGTTGATCGATGTATCCGAGAAGATGACCAAACTTGATCATTT
>AM745254 1 1365 Archaea/Euryarchaeota/Halobacteriales/uncultured
TTCCGGTTGATCCTGCCGGACCTGACTGCTATTGGAGTAGGACTAAGTCACGCTAGTCAAAGGTGTGGAATGGAACACCTGGCGCACGGCTCAGTAACACGTAGTGAACCTACCCTAAGGACGAGGACAACCACGGGAAACTGTGGCTAATCCTCGATAGGAAATTTGGCCTGGAACGGTATCTTTCCTAAAACCGGCTCGCCGTGAGACACGGGCCTTAGGATGGCGCTGCGGCCGATTATGCTAGACGGCGGTGTAAAGGACCACCGTGGCGACGATCGGTATGGGCGATGGAAGTCGGAGCCCAGAGTCGGCTACTGAGACAAGGAGCCGAGCCTTACGAGGCTTAGCGGTCGCGAAAACTCGCCAATGCACGAAAGTGTGAGTGGGCTACTCCAAGTGTCATTCTTACGGATGACTGTCGCCCAGTTTTACAAGCTGGGAAAGGAAGGAGAGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCTTCGAGTGGTCAGGACGAATATTGGGTCTAAAGCGTTCGTAGCGGGACAAGTAGGTTCCTGGTTAAATCCGATGTCACAAGCATCGGGCTGCTGGGAATACCGCTAGTCTTGAGAGCGGGATAGGACAGGGGTAGTCTATGGGCAGGGGTGAAATCCAGTGATCCATAGGCGACCACCGATGGCGAAGGCACCTGTCTGGAACGTATCTAACCGTGATGGACGAAAGCCAGGGGAGCGACCCGGATTAGATACCCGGTTAGTCCTGGCCGTAAACGATGCCGACTAGGTGTTGCAGCGGCCAAGAGCCACTGCAGTGCCACAGTGAAGACGTTAAGTCGGCCACCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGACGGGGGCGCACCACCAGGAGTGAAGCCTGCGGTTTAATTGGATTCAACGCCGAAAAACTCACCTAAACAGACGGCAGAATGAAGCTCAAGTTAATGACTTTAGCTAACTCGCCGAGAGGAAGTGCATGGCCGTCGACAGTTCGTGCTGTGAAGTGTCTTGTTAAGTCAAGCAACGAACGAGATCCACGTCCGCAATTGCCAGCGGGTCCCTTTGGGATGCCGGGAACCTTGCGGAGACTGCTTGGTGCTAAACCAGAGGAAGGAGTGGGCAACGGCAGGTCAGTATGCTCCGATAGTTTAGGGCTACACGCGGGCTGCAATGGTCGGTACAATGGGCCGCGACCCCGAAAGGGGAAGCCAATCCCGAAAGCCGGTCTCAGTCAGGATTGGGGTTTGCAACTCAGCCCCATGAATATGGAATTCCTAGTAAACGTGTTTCATTAAGACACGTTGAATACGTCCCCGCGCCTTGTACACACCGCCCGT
>AY175392 1 1057 Archaea/Euryarchaeota/Methanomicrobiales
CCCTTTCTGGTTGATCCTGCCAGAGGCCACTGCTATCGGGGTTCGACTAAGCCATGCGAGTCGAGAGGGGTAATGCCCTCGGCGAACGGCTCAGTAACACGTGGACAACCTACCCTCAGATCTGGGATAACTCCGGGAAACTGGAGATAATACCGGATAATCCGTGAACGCTGGAATGCCTTACGGTTCAAAGCTTTAGCGTCTGAGGATGGGTCTGCGGCCGATTAGGTAGTTGCTGGGGTAACGTCCCAACAAGCCGATAATCGGTACGGGTTGTGAGAGCAAGAGCCCGGAGATGGATTCTGAGACACGAATCCAGGTCCTACGGGGCGCAGCAGGCGCGAAAACTTTACACTGCGCGAAAGCGCGATAAGGGAACCTCGAGTGCGTGCGCAATGCGTACGCTTTTCACATGCCTAAAAAGCATGTGGAATAAGAGCCGGGCAAGACCGGTGCCAGCCGCCGCGGTAACACCGGCGGCTCAAGTGGTGGCCGCTATTATTGGGCTTAAAGGGTCCGTAGCCGGACCAGTTAGTCCCTTGGGAAATCTTACGGCTTAACCGTAAGGCTGCCAATGGATACTGCTGGCCTTGGGACCGGGAGAGGCAAGAGGTACCTCAGGGGTAGGAGTGAAATCCTGTAATCCTTGAGGGACCGCCAGTGGCGAAGGCGTCTTGCTAGAACGGGTCCGACGGTGAGGGACGAAAGCTAGGGGCACGAACCGGATTAGATACCCGGGTAGTCCTAGCCGTAAACGATGCGAGCTAGGTGTCACGTGGATTGCGAATCCATGTGGTGCCGTAGGGAAACCGTGAAGCTCGCCGCCTGGGAAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAACGGGTGGAGCCTGCGGTTTAATTGGACTCAACGCCGGAAAGCTCACCGGAGACGACAGCGGGATGAGGGCCAGGCTGATGACCTTGCTAGACTAGCTGAGAGGAGGTGCATGGCCGCCGTCAGTTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCAAAGGG
>AY284588 1 1736 Eukarya/Metazoa/Nematoda/Aphelenchus et rel.
CTCAAAGATTAAGCCATGCATGTGTAAGTATAAACGATTCAATCGTGAAACCGCGAACGGCTCATTATAACAGCTATGATCTACTTGATCTTGAGAATCCTAATTGGATAACTGTAGTAATTCTAGAGCTAATACATGCATAAGAGCTCGAACCTTGCGCAAGCGGGGGAAGAGTGCATTTATTGGAAGAAGACCAGTTGTGGCTGTAAAAAGCTGCATGTCGTTGACTCGCAATAACTAAGCTGATCGCATGGCCTTGTGCCGGCGACGAGTCTTTCGAGTATCTGCCTTATCAACTTTCGACGGTAGTGTATTTGACTACCATGGTGGTGACGGGTAACGGAGGATAAGGGTTCGACTCCGGAGAAGGGGCCTGAGAAATGGCCACTACGTCTAAGGATGGCAGCAGGCGCGCAAATTACCCACTCTCGGTACGAGGAGGTAGTGACGAAAAATAACGAAGAGGTCCCCTATGGGTCTTCTATTGGAATGGGTACAATTTAAACCCTTTAACGATTAACCAAGAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTCTAAATGCATAGATACATTGCTGCGGTTAAAAAGCTCGTAGTTGGATCTGTGTTGGGGACTTGGTCCACTCTAACGGGTGGTACTTTGCTCCTTGACAATCAATGTTGGCTCACTTGGCGTAGTCTTCAGTGATTGCGTCATAGTTGGCTGACGAGTTTACTTTGAGCAAATCAGAGTGCTCCAAACAGGCGTTTACGCTTGAATGTTCGTGCATGGAATAATAGAAGAGGATTTCGGTTCTATTTTGTTGGTTTTGAGACCGAGATAATGGTTAACAGAGACAGACGGGGGCATTCGTACTTCTGCGTGAGAGGTGAAATTCTTGGACCGCAGAAAGACGCACCACAGCGAAAGCATTTGCCAAGAATGTCTTCATTAATCAAGAACGAAAGTCAGAGGATCGAAGGCGATCAGATACCGCCCTAGTTCTGACCGTAAACGATGCCAACTAGCGATCTGTCGGTGGTGTGTTTTCGCCCTGATAGGGAGCTTCCCGGAAACGAAAGTCTTCGGGTTCCGGGGGAAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAACCTCACCCGGGCCGGACACCGTAAGGATTGACAAATTGATAGCTTTTTCATGATTCGGTGGTTGGTGGTGCATGGCCGTTCTTAGTTCGTGGAGCGATTTGTCTGGTTTATTCCGATAACGAGCGAGACTCTAGCCTACTAAATAGTTGGCACATTACATTGTGCGTCCTAACTTCTTAGAGGGATTTACGGCGTATAGCCGCAAGAGAATGAGCAATAACAGGTCTGTGATGCCCTTAGATGTCCGGGGCTGCACGCGCGCTACACTGGTGAAATCAACGTGTTCTCCTATGCCGAGAGGCACTTGGGTAAACCATTGAAAATTCGCCGTGATTGGGATCGGAGATTGAAATTATTTTCCGTGAACGAGGAATTCCAAGTAAGTGCGAGTCATCAACTCGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACCCGGGACTGGGTTATTTCGAGAAATTTGAGGATTGGCTAGGTGCTTGATGCCTCCGGGTGTCATCGCCTGTCGAGAATCAACTTAATCGAGATGGCCTGAACCGGGT
>AY454558 1 1110 Archaea/Crenarchaeota/uncultured/uncultured
ACTCACTAAGAGCGAATTGGGCCTTTCGTCGCATGCTAAAAGGCCGCCATGGCCGCGGGATTGGGCACGGGGGGACGGGTTGCCGCAGGCGCGAAACCTCTGCAATAGGCGAAAGCTTGACAGGGTTACTCTGAGTGATTTCCGTTAAGGAGATCTTTTGGCACCTCTAAAAATGGTGCAGAATAAGGGGTGGGCAAGTCTGGTGTCAGCCGCCGCGGTAATACCAGCACCCCGAGTGGTCGGGACGTTTATTGGGCCTAAAGCATCCGTAGCCGGTTCTACAAGTCTTCCGTTAAATCCACCTGCTTAACAGATGGGCTGCGGAAGATACTATAGAGCTAGGAGGCGGGAGAGGCAAGCGGTACTCGATGGGTAGGGGTAAAATCCGTTGATCCATTGAAGACCACCAGTGGCGAAGGCGGCTTGCCAGAACGCGCTCGACGGTGAGGGATGAAAGCTGGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCCAGCTGTAAACGATGCAGACTCGGTGATGAGTTGGCTTCTTGCTAACTCAGTGCCGCAGGGAAGCCGTTAAGTTTGCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGAAGCCTGCGGTTCAATTGGAGTCAACGCCGGAAATCTTACCGGGGGCGACAGCAGAGTGAAGGTCAAGCTGAAGACTTTACCAGACAAGCTGAGAGGAGGTGCATGGCCGTCGCCAGCTCGTGCCGTGAGGTGTCCTGTTAAGTCAGGTAACGAGCGAGATCCCTGCCTCTAGTTGCTACCATTATTCTCAGGAGTAGTGGAGCTAATTAGAGGGACCGCCGTCGCTGAGACGGAGGAAGGTGGGGGCTACGGCAGGTCAGTATGCCCCGAAACCCTCGGGCCACACGCGGGCTGCAATGGTAAGGACAATGAGTTTCAATTCCGAAAGGAGGAGGCAATCTCTAAACCTTACCACAGTTATGATTGAGGGCTGAAACTCGCCCTCATGAATATGGAATCCCTAGTAACCGCGTGTCACTATCGCGCGGTGAATACGTCCCTGCTCCTTGCACGAGTTAACCGAATCACTAGT
>DQ421767 1 1422 Bacteria/Beta Gammaproteobacteria/Gammaproteobacteria_1/Oceanospirillales_2/Marinomonas
AGCGGTAACAGGAATTAGCTTGCTAATTTGCTGACGAGCGGCGGACGGGTGAGTAACGCGTAGGAATCTGCCTGGTAGTGGGGGACAACATGTGGAAACGCATGCTAATACCGCATACGCCCTACGGGGGAAAGGAGGGGATCTTCGGACCTTTCGCTATCAGATGAGCCTGCGTGAGATTAGCTAGTTGGTGGGGTAAAGGCTCACCAAGGCGACGATCTCTAGCTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTCAGTTGGGAAGATGATGACGTTACCAACAGAAGAAGCACCGGCTAAATCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGTTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGACCAGAAAGTTGGGGGTGAAATCCCGGGGCTCAACCCCGGAACGGCCTCCAAAACTCCTGGTCTTGAGTACGGCAGAGGGGGATGGAATTCCGCGTGTAGCAGTGAAATGCGTAGATATAGGAAGGAACATCAGTGGCGAAGGCGACACCCTGGACCGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCTACTAGCCGTTGGGGATTTTATTCTTAGTGGCGCAGCTAACGCGATAAGTAGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTACTCTTGACATCCAGAGAATTTAGCAGAGATGCTTTAGTGCCTTCGGGAACTCTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTATCCTTATTTGCCAGCACTTCGGGTGGGAACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGGCCCTTACGAGTAGGGCTACACACGTGCTACAATGGCGTATACAGAGGGCCGCAAGACCGCGAGGTGGAGCAAATCCCAAAAAGTACGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTTGATTGCTCCAGAAGTAGCTAGCTTAACCTTCGGGATGGCGGTTACCACGGAGTGGTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCTAGG
>DQ628981 1 1786 Eukarya/Rhodophyta et al./Rhodophyta/Florideophyceae/Corallinales
CACCTGGTTGATCCTGCCAGTGGTATATGCTTGTCTCAAAGACTAAGCCATGCAAGTCTAAGTATAAGTTATTCTTACGACAAAACTGCGAATGGCTCGGTAAAACAGCAATAATTTCTTCAGTGATGATTTTACTCACGGATAACCGTAGTAATTCTAGAGCTAATACGTGCAAATTAAAGCAATGACCGCAAGGCCAGCGCTGTGCCGTTTAGATAACAACACCATCATTTGGTGATTCATAATCGTCTTTCTGATCGCTTCGTGCGACACACTGTTCAAATTTCTGACCTATCAACTTTCGATGGTAAGGTAGTGTCTTACCATGGTTATGACGGGTAACGGACCGTGGGTGCGGGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGTAAATTACCCAATCCAGACACTGGGAGGTAGTGACAAGAAATATCAATGGGGGAACTGTAAAGTTCTTCCAATTGGAATGAGATCGAGCTAAATAGCCAAATCGAGAATCCAGCAGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCTGTAAGCGTATACCAAAGTTGTTGCACTTAAAACGCTCGTAGTCGGACATTGGTAGTTCCGGGAGTGTGCGCGTCGTGTGCATGCTCTGCGGGACTGCCTTTCGTGGAGTTGTCGGAGGGATGAAGCATTTTAATTAATGAACGTCCACCGCGCCCACTTTTTACTGTGAGAAAATCAGAGTGCTCAAAGCAGGCAATTGCCGTGAATGTATTAGCATGGAATAATAGAATAGGACTCGTTTCTATTTTGTTGGTTTGTTGGGAATGAGTAATGATTAAGAGGGACAGTTGGGGGCATTTGTATTACGAGGCTAGAGGTGAAATTCTTAGATTCTCGTAAGACAAACTGCTGCGAAAGCGTCTGCCAAGGATGTTTTCATTGATCAAGAACGAAAGTAAGGGGATCGAAGACGATCAGATACCGTCGTAGTCTTTACTATAAACGATGAGAACTAGGGATCGGGCGAGGCATTACGATGACCCGCCCGGCACCTTCCGCGAAAGCAAAGTGTTTGCTTTCTGGGGGGAGTATGGTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCATCACCGGGTGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTTACCAGGTCAGGACATAGTGAGGATGAACAGATTGAGAGCTCTTTTTTGATTCTATGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGTGATTTGTCTGGTTAATTCCGTTAACGAGCGAGACCTGGGCGTGCTAACTAGGAGAGGCTACACTCGTGGTAGTTTTCGACTTCTTAGACGGACTGGTGGCGTCTAGCCACCGGAAGCTCCAGGCAATAACAGGTCTGAGATGCCCTTAGATGTTCTGGGCCGCACGCGTGCTACACTGAGTAATTCAATGGGTAAGGGAACACGAAAGTGCGACCTAATCTTGAAATTTGCTCGTGATGGGGATCGACGGTTGCAATTTTCCGTCGTGAACGAGGAATACCTTGTAGGCGCGTGTCATCATCACGCGCCGAATACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATTGAGTGATCCGGTGAGGCTCTGGGACCTGAGCGGAAAGAGCGTTTCGCTTGTTCTGCTTGGGAAACTTGGTCGAACCTTATCATTTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGCTA
>EF406474 1 1502 Bacteria/Firmicutes/Clostridiales/Ruminococcus et rel./Papillibacter et rel./Oscillospira
TAGAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAGCACCCTTGAAGGAGTTTTCGGACAACGGATAGGAATGCTTAGTGGCGGACTGGTGAGTAACGCGTGAGGAACCTGCCTTCCAGAGGGGGACAACAGTTGGAAACGACTGCTAATACCGCATGACGCATTGGTGTCGCATGGCACTGATGTCAAAGATTTATCGCTGGAAGATGGCCTCGCGTCTGATTAGCTAGTTGGTGAGGTAACGGCCCACCAAGGCGACGATCAGTAGCCGGACTGAGAGGTTGGCCGGCCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGGCAATGGACGCAAGTCTGACCCAGCAACGCCGCGTGAAGGAAGAAGGCTTTCGGGTTGTAAACTTCTTTTAAGGGGGAAGAGCAGAAGACGGTACCCCTTGAATAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTACTGGGTGTAAAGGGCGTGCAGCCGGAGAGACAAGTCAGATGTGAAATCCACGGGCTCAACCCGTGAACTGCATTTGAAACTGTTTCCCTTGAGTGTCGGAGAGGTAATCGGAATTCCTTGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGGATTACTGGACGATAACTGACGGTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATCGATACTAGGTGTGCGGGGACTGACCCCCTGCGTGCCGGAGTTAACACAATAAGTATCGCACCTGGGGAGTACGATCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGATTATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGGCTTGACATCCTACTAACGAAGTAGAGATACATTAGGTGCCCTTCGGGACAAGAGAGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCTTCAGTAGCCAGCAGGTAAAGCCGGGCACTCTGGAGAGACTGCCGGGGATAACCCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGCTACACACGTGCTACAATGGCGTAAACAGAGGGAAGCGAGCCCGCGAGGGGGAGCAAATCCCAAAAATAACGTCCCAGTTCGGATTGTAGTCTGCAACCCGACTACATGAAGCTGGAATCGCTAGTAATCGCGGATCAGAATGCCGCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCATGGGAGTCGGAAATGCCCGAAGTCTGTGACCCAACCGCAAGGAGGGAGCAGCCGAAGGCAGGTCGGATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAA
>EF516988 1 1782 Bacteria/Firmicutes/Bacillales Mollicutes/Staphylococcaceae/Staphylococcus/Staphylococcus aureus et rel./Staphylococcus aureus et rel./Staphylococcus warneri
GTACCGCTTTGGAGCCTCTCGAGTTTGATCCTGGCTCAGGAGGTCCTAACAAGGTAACCAGTATTGGATCCCCTAGAGTTTGATCCCGGCCCCTAAAGTTTGAACAAAGTCCAGGAAATTGGGGCCCCTACAGTTTAATCTCTTTTGCTTCATGGTAAAAAACTGAAAGACGGTTTCGGCTGTCGCTATTTGATGGGCCCGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGCGACGATGCGTAGCCCACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGGCGAAAGCCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTAAGGGAAGAACAAGTACAGTAGTAACTGGCTGTACCTTGACGGTACCTTATTAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGTCCTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCAAGTGTAGCGGTGAAATGCGTAGAGATTTGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGAGAGTACGGTCGCAGGACTGAAACTCAAAAGAATTTGACGGGGGGCTCCTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGGGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCAAGTGTAGCGGTGAAATGCGTAGAGATTTGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCCGTTGACCACTGTAGAGATATAGTTTCCCCTTCGGGGGCAACGGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGATCTTAGTTGCCATCATTTAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACGATACAAACGGTTGCCAACTCGCGAGAGGGAGGTATCCGATAAAGTCGTTCTCAGTTCGGATTGTTGGCCCCAACTCGCGTACGTGAAACCAGAATAACCAGTAATGGCTCCTCAGCATTTTGATCCGGGCTCGTTAAGTGGTAACAAGGTAACCGCTATTGGATCCTTAGAGTTTGATCCGGCTCAGGAAGTCGTAACAAGGTAACCAGTATGGTCCTCTAGAG
>EF551905 1 1203 Bacteria/Beta Gammaproteobacteria/Xanthomonadales
GATAGCGGCGCGATTCGCCCTTCCTACGGGGGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCAGATCCAGCCATGCCGCGTGGGTGAAGAAGGCCTTCGGGTTGTAAAGCCCTTTTGTTGGGAAAGAAAGACGTCCGGCTAATACCCGGATGGAATGACGGTACCCAAAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTACTCGGAATTACTGGGCGTAAAGGGTGCGTAGGTGGTTCGTTAAGTCTGATGTGAAAGCCCTGGGCTCAACCTGGGAATTGCATTGGATACTGGCGAGCTGGAGTGCGGTAGAGGGTAGTGGAATTCCCGGTGTAGCAGTGAAATGCGTAGATATCGGGAGGAACATCCGTGGCGAAGGCGACTACCTGGACCAGCACTGACACTGAGGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGCGAACTGGATGTTGGGTTCAATCAGGAACTCAGTATCGAAGCTAACGCGTTAAGTTCGCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGTATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGCCTTGACATGTCGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCGAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCTTAGTTGCCAGCACGTAATGGTGGGAACTCTAAGGAGACCGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTACTACAATGGGAAAGGACAGAGGGCTGCGAACCCGCGAGGGCAAGCCAATCCCAGAAACCTTTCTCCCAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGCAGATCAGCATTGCTGCGGTGAATACGTTTCCGGTCTTGTACAACACCGCCCGTCACACCATGGGAGTGGGTGCCACCAGAAGTAGCTAGACTACGTTCGGGAGACCGTTACCCACGGTTGAATTCATGGACTTGGGGTGAGTCCGTAAACAGGGTTACCCCCG
>EU132755 1 1345 Bacteria/Actinobacteria/CMN et rel./CMN/Pseudonocardiaceae_3/Pseudonocardia aurantiaca et rel./Pseudonocardia aurantiaca et rel.
GAACGCTTGACGGCGTGCTTACACATGCAAGTCGAACGGGCCATTGCTCTTCGGGGTGGTGGTTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTCGGCTTCGGGATAAGCCTGGGAAACTGGGTCTAATACCGGATATTCACATCTTGTTGCATGGTGGGGTGTGGAAAGGGTTTCTGGCTGGGGATGGGCTCGCGGCCTATCAGCTTGTTGGTGGGGTGATGGCCTACCAAGGCGGTGACGGGTAGCCGGCCTGAGAGGGCGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCGCAATGGGCGGAAGCCTGACGCAGCGACGCCGCGTGGGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGCCCCGACGAAGCGAAAGTGACGGTAGGGGTAGAAGAAGCGCCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGAGCTCGTAGGCGGTTTGTCGCGTCTTCCGTGAAAACTGGGGGCTTAACTTCCAGCTTGCGGTGGATACGGGCTGACTGGAGTGCGGCAGGGGAGACTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGGTGGCGAAGGCGGGTCTCTGGGCCGTTACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGTTGGGCGCTAGGTGTGGGGGACTTTCCACGTTCTCCGTGCCGTAGCTAACGCATTAAGCGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGTGGCTTAATTCGATGCAACGCGAAGAACCTTACCTGGGTTTGACATGCGCGGTAATCCTGTAGAGATACAGGGTCCTTCGGGGCCGTGTACAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTTCCATGTTGCCAGCACGTGATGGTGGGGACTCATGGGAGACTGCCGGGGTCAACTCGGAGGAAGGTAGGGATGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTGCACACATGCTACAATGGCTCATACAGAGGGCTGCGATGCTGTGAGGCTGAGCGAATCCCTTAAAGTGAGTCTCAGTTCGGATCGGGGTCTGCAACTCGACCCCGTGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGATACGTTCCCGGGCATTGCACTCA
>EU570118 1 1433 Archaea/Euryarchaeota/Thermoplasmatales/uncultured
CGGTTGATCCTGCCGGCGCTCACCGCTCTTGGAATCCGATTAAGCCATGTGAGTCGAGAGGGTTCGGCCCTCGGCAAACTGCTCAGTAACACGTGGATAACCTAACCTAAGGTGGGAGATAATCTCGGAAAACTGAGGCTAATATCCCATAGACCTTGATGACTGGAATGTTTTGAGGTTTAAAGTTACGACGCCTTAGGATGGGTCTGCGGCCTATCAGGTTGTAGTTAGTGTAAAGGACTAACTAGCCGACGACGGGTACGGGCCATGGGAGTGGTTGCCCGGAGATGGACTCTGAGACACGAGTCCAGGCCCTACGGGGCGCAGCAGGCGCGAAAACTTTGCAATGCGCGAAAGCGCGACAAGGGGATTCCAAGTGCATGCACTAAGTGTATGCTTTTCGTGAGTGTAAAAAGCTCACGGAATAAGGGCTGGGTAAGACTGGTGCCAGCCGCCGCGGTAATACCAGCGGCCCTAGTGGTGATCGTTTTTATTGGGCCTAAAGCGTCCGTAGCCGGTTCGGTAAATCTCTGGGTAAATCGTTGGGCTTAACCCAACGAATTCTGGGGAGACTGCCGAACTTGGGACCGGGAGAGGTCGGAGGTACTCCAGGGGTAGGGGTGAAATCCTGTAATCCTTGGGGGACCACCGGTGGCGAAAGCGTCCGACCAGAACGGGTCCGACGGTAAGGGACGAAGCCCTGGGTCGCGAACCGGATTAGATACCCGGGTAGTCCAGGGTGTAAACGCTGTGCGCTTGGTGTAGGGGGTCCTACGAGGGCATCCTGTGCCGGAGAGAAGTTGTTAAGCGCACCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGGAGCACAGCAACGGGAGGAGCGTGCGGTTTAATTGGATTCAACGCCGGAAAACTCACCAGGGGCGACTGCCACATGAAGATCAAGCTGATGACTTTATCTGATTGGTAGAGAGGTGGTGCATGGCCGTCGTCAGTTCGTACCGTAGGGCGTTCTGTTAAGTCAGATAACGAACGAGACCCTTGCCCTTAATTGCCATGTTTCCCTCCGGGGGAACGGTACTTTAAGGGGACCGCTGGTGCAAAATCAGAGGAAGGGAAGGGCAACGGTAGGTCAGTATGCCCCGAATCCCCTGGGCAACACGCGCGCTACAAAGGCCGGGACAAAGGGTTCCGACACCGAGAGGTGAAGGTAATCCCGAAACCTGTCCGTAGTTCGGATCGAGGGCTGCAACCCGCCCTCGTGAAGCTGGATTCCGTAGTAATCGCAGATCAACATCCTGCGGTGAATATGCCCCTGCTCCTTGCACACACCGCCCGTCAAACCATCCGAGTGGAGTTTCGATGAGGGTGGGATTCTTGTCCTTCTCAAATCGCGATTTCGCAAGGAGGGTTAAGTCGTAACAAGGTAACC"""

def label_to_name(x):
    fields = x.split()
    return '%s: %s' % (fields[3].split('/')[0], fields[0])

seqs = LoadSeqs(data=fasta_str.split('\n'),moltype=DNA,aligned=False,label_to_name=label_to_name)

Now pick up with Step 5: Align the sequences using muscle above.