Cynthia Hsu: Interpreting the Outout of the Undertaker Protein-Folding Algorithm

Student's Name: 
Cynthia Hsu
None
Advisor's Name: 
Kevin Karplus
Home University: 
University of California, Berkeley
AttachmentSize
PDF icon CynthiaHsu_report.pdf117.78 KB
PDF icon CynthiaHsu_poster.pdf1.48 MB
Year: 
2006

The Critical Assessment of Protein Structure Prediction (CASP) is a community-wide experiment comparing the success of various protein structure prediction protocols. These included automatic server predictions, in which various algorithms search iteratively through databases and generate models accordingly, and "hand" predictions, in which a team of researchers analyze the output of the automatic server predictions and make modifications accordingly. For the purposes of the experiment, the sequences of a hundred proteins, whose structures had recently been deciphered from crystallography, were released to the CASP teams several weeks before the structures were released. From the sequences, all teams attempted to generate an appropriate representation of the three-dimensional structure. This year was the seventh time that the semi-annual CASP experiment had been performed.

The SAM-T06 team, headed by Professor Kevin Karplus at the University of California, Santa Cruz submitted two sets of predictions: those automatically generated by their server, and those which had been hand-modified by the team. My role in the project was to participate in the hand-modification of various target proteins after a model had been generated by the Undertaker algorithm and the SAM-T06 server, namely those referred to as comparative modeling targets (>55% sequence identity) and fold-recognition targets (>40% sequence identity). This was performed by examining the targets under local structure alphabets and scripts, and changing the weights on the cost function, adding and modifying sheet, helical, and distance constraints, and sometimes isolating portions of the amino acid sequences and modeling them as subdomains.

One example of a target that was improved by modeling subdomains was T0358. Initially, residues that should have been exposed were originally buried by the algorithm. Creating a subdomain by removing the histidine tag enabled the protein to be rearranged independent of it, which made it align with more homologous proteins and produced an improved model, after which the histidine tag was added to the end. Another target, T0385, was in the shape of a four helix bundle. By measuring distance constraints of closely related proteins and adding distance constraints as scaffolding between the residues of different helices, it was possible to produce a variety of different alignments from a four helix bundles with slightly different end loop behavior.