Using the MolType and Sequence objects

MolType

MolType provides services for resolving ambiguities, or providing the correct ambiguity for recoding. It also maintains the mappings between different kinds of alphabets, sequences and alignments.

One issue with MolType‘s is that they need to know about Sequence, Alphabet, and other objects, but, at the same time, those objects need to know about the MolType. It is thus essential that the connection between these other types and the MolType can be made after the objects are created.

Setting up a MolType object with an RNA sequence

>>> from cogent.core.moltype import MolType, IUPAC_RNA_chars,\
...   IUPAC_RNA_ambiguities, RnaStandardPairs, RnaMW,\
...   IUPAC_RNA_ambiguities_complements
>>> from cogent.core.sequence import NucleicAcidSequence
>>> testrnaseq = 'ACGUACGUACGUACGU'
>>> RnaMolType = MolType(
...     Sequence = NucleicAcidSequence(testrnaseq),
...     motifset = IUPAC_RNA_chars,
...     Ambiguities = IUPAC_RNA_ambiguities,
...     label = "rna_with_lowercase",
...     MWCalculator = RnaMW,
...     Complements = IUPAC_RNA_ambiguities_complements,
...     Pairs = RnaStandardPairs,
...     add_lower=True,
...     preserve_existing_moltypes=True,
...     make_alphabet_group=True,
...     )

Setting up a MolType object with a DNA sequence

>>> from cogent.core.moltype import MolType, IUPAC_DNA_chars,\
...   IUPAC_DNA_ambiguities, DnaMW, IUPAC_DNA_ambiguities_complements,\
...   DnaStandardPairs
>>> testdnaseq = 'ACGTACGTACGUACGT'
>>> DnaMolType = MolType(
...     Sequence = NucleicAcidSequence(testdnaseq),
...     motifset = IUPAC_DNA_chars,
...     Ambiguities = IUPAC_DNA_ambiguities,
...     label = "dna_with_lowercase",
...     MWCalculator = DnaMW,
...     Complements = IUPAC_DNA_ambiguities_complements,
...     Pairs = DnaStandardPairs,
...     add_lower=True,
...     preserve_existing_moltypes=True,
...     make_alphabet_group=True,
...     )

Setting up a DNA MolType object allowing . as gaps

>>> from cogent.core import moltype as mt
>>> DNAgapped = mt.MolType(Sequence=mt.DnaSequence,
...                        motifset=mt.IUPAC_DNA_chars,
...                        Ambiguities=mt.IUPAC_DNA_ambiguities,
...                        Complements=mt.IUPAC_DNA_ambiguities_complements,
...                        Pairs = mt.DnaStandardPairs,
...                        Gaps='.')
>>> seq = DNAgapped.makeSequence('ACG.')

Setting up a MolType object with a protein sequence

>>> from cogent.core.moltype import MolType, IUPAC_PROTEIN_chars,\
...   IUPAC_PROTEIN_ambiguities, ProteinMW
>>> from cogent.core.sequence import ProteinSequence, ModelProteinSequence
>>> protstr = 'TEST'
>>> ProteinMolType = MolType(
...     Sequence = ProteinSequence(protstr),
...     motifset = IUPAC_PROTEIN_chars,
...     Ambiguities = IUPAC_PROTEIN_ambiguities,
...     MWCalculator = ProteinMW,
...     make_alphabet_group=True,
...     ModelSeq = ModelProteinSequence,
...     label = "protein")
>>> protseq = ProteinMolType.Sequence

Verify sequences

>>> rnastr = 'ACGUACGUACGUACGU'
>>> dnastr = 'ACGTACGTACGTACGT'
>>> RnaMolType.isValid(rnastr)
True
>>> RnaMolType.isValid(dnastr)
False
>>> RnaMolType.isValid(NucleicAcidSequence(dnastr).toRna())
True

Sequence

The Sequence object contains classes that represent biological sequence data. These provide generic biological sequence manipulation functions, plus functions that are critical for the evolve module calculations.

Warning

Do not import sequence classes directly! It is expected that you will access them through MolType objects. The most common molecular types DNA, RNA, PROTEIN are provided as top level imports in cogent (e.g. cogent.DNA). Sequence classes depend on information from the MolType that is only available after MolType has been imported. Sequences are intended to be immutable. This is not enforced by the code for performance reasons, but don’t alter the MolType or the sequence data after creation.

More detailed usage of sequence objects can be found in DNA and RNA sequences.