Structural data

Section author: Kristian Rother, Patrick Yannul

Protein structures

Reading Protein structures

Retrieve a structure from PDB

>>> from cogent.db.pdb import Pdb
>>> p = Pdb()
>>> pdb_file = p['4tsv']
>>> pdb = pdb_file.read()
>>> len(pdb)
134136

This example will retrieve the structure as a PDB file string.

Parse a PDB file

>>> from cogent.parse.pdb import PDBParser
>>> struc = PDBParser(open('data/4TSV.pdb'))
>>> struc
<Structure id=4TSV>

Parse a PDB entry directly from the web

>>> from cogent.parse.pdb import PDBParser
>>> struc = PDBParser(p['4tsv'])

Accessing PDB header information

>>> struc.header['id']
'4TSV'
>>> struc.header['resolution']
'1.80'
>>> struc.header['r_free']
'0.262'
>>> struc.header['space_group']
'H 3'

Useful methods to access Structure objects

How to access all atoms, residues etc via a dictionary

The table property of a structure returns a two-dimensional dictionary containing all atoms. The keys are 1) the entity level (any of ‘A’,’R’,’C’,’M’) and 2) the combined IDs of Structure, Model, Chain, Residue, Atom as a tuple.

>>> struc.table['A'][('4TSV', 0, 'A', ('HIS', 73, ' '), ('O', ' '))]
<Atom ('O', ' ')>

Calculate the center of mass of a model or chain

>>> model.coords
array([ 146.66615752,   35.08673503,   -3.60735847])
>>> chain.coords
array([ 146.66615752,   35.08673503,   -3.60735847])

How to get a list of all residues in a chain?

>>> chain.values()[0]
<Residue ILE resseq=154 icode= >

How to get a list of all atoms in a chain?

>>> resi.values()[0]
<Atom ('N', ' ')>

Constructing structures

How to create a new entity?

Structure/Model/Chain/Residue/Atom objects can be created as follows:

>>> from cogent.core.entity import Structure,Model,Chain,Residue,Atom
>>> from numpy import array
>>> s = Structure('my_struc')
>>> m = Model((0),)
>>> c = Chain(('A'),)
>>> r = Residue(('ALA', 1, ' ',),False,' ')
>>> a = Atom(('C  ',' ',), 'C', 1, array([0.0,0.0,0.0]), 1.0, 0.0, 'C')

How to add entities to each other?

>>> s.addChild(m)
>>> m.addChild(c)
>>> c.addChild(r)
>>> r.addChild(a)
>>> s.setTable(force=True)
>>> s.table
{'A': {('my_struc', 0, 'A', ('ALA', 1, ' '), ('C  ', ' ')): <Atom ('C  ', ' ')>}, 'C': {('my_struc', 0, 'A'): <Chain id=A>}, 'R': {('my_struc', 0, 'A', ('ALA', 1, ' ')): <Residue ALA resseq=1 icode= >}, 'M': {('my_struc', 0): <Model id=0>}}

How to remove a residue from a chain?

>>> c.delChild(r.id)
>>> s.table
{'A': {('my_struc', 0, 'A', ('ALA', 1, ' '), ...

Geometrical analyses

Calculating euclidean distances between atoms

>>> from cogent.maths.geometry import distance
>>> atom1 = resi[('N', ' '),]
>>> atom2 = resi[('CA', ' '),]
>>> distance(atom1.coords, atom2.coords)
1.4691967192993618

Calculating euclidean distances between coordinates

>>> from numpy import array
>>> from cogent.maths.geometry import distance
>>> a1 = array([1.0, 2.0, 3.0])
>>> a2 = array([1.0, 4.0, 9.0])
>>> distance(a1,a2)
6.324...

Calculating flat angles from atoms

>>> from cogent.struct.dihedral import angle
>>> atom3 = resi[('C', ' '),]
>>> a12 = atom2.coords-atom1.coords
>>> a23 = atom2.coords-atom3.coords
>>> angle(a12,a23)
1.856818...

Calculates the angle in radians.

Calculating flat angles from coordinates

>>> from cogent.struct.dihedral import angle
>>> a1 = array([0.0, 0.0, 1.0])
>>> a2 = array([0.0, 0.0, 0.0])
>>> a3 = array([0.0, 1.0, 0.0])
>>> a12 = a2-a1
>>> a23 = a2-a3
>>> angle(a12,a23)
1.5707963267948966

Calculates the angle in radians.

Calculating dihedral angles from atoms

>>> from cogent.struct.dihedral import dihedral
>>> atom4 = resi[('CG1', ' '),]
>>> dihedral(atom1.coords,atom2.coords,atom3.coords, atom4.coords)
259.49277688244217

Calculates the torsion in degrees.

Calculating dihedral angles from coordinates

>>> from cogent.struct.dihedral import dihedral
>>> a1 = array([0.0, 0.0, 1.0])
>>> a2 = array([0.0, 0.0, 0.0])
>>> a3 = array([0.0, 1.0, 0.0])
>>> a4 = array([1.0, 1.0, 0.0])
>>> dihedral(a1,a2,a3,a4)
90.0

Calculates the torsion in degrees.

Other stuff

How to count the atoms in a structure?

>>> len(struc.table['A'].values())
1187

How to iterate over chains in canonical PDB order?

In PDB, the chain with space as ID comes last, the others in alphabetical order.

>>> for chain in model.sortedvalues():
...     print chain
<Chain id=A>

How to iterate over chains in alphabetical order?

If you want the chains in purely alphabetical order:

>>> keys = model.keys()
>>> keys.sort()
>>> for chain in [model[id] for id in keys]:
...     print chain
<Chain id=A>

How to iterate over all residues in a chain?

>>> residues = [resi for resi in chain.values()]
>>> len(residues)
218

How to remove all water molecules from a structure

>>> water = [r for r in struc.table['R'].values() if r.name == 'H_HOH']
>>> for resi in water:
...     resi.parent.delChild(resi.id)
>>> struc.setTable(force=True)
>>> len(struc.table['A'].values())
1117
>>> residues = [resi for resi in chain.values()]
>>> len(residues)
148