Christoph Rau: Using Conserved Elements to Assist in Gene Annotation

Student's Name: 
Christoph Rau
None
Advisor's Name: 
Todd Lowe
Home University: 
Harvey Mudd
AttachmentSize
PDF icon rau_poster.pdf333.42 KB
PDF icon rau_report.pdf164.34 KB
Year: 
2006

Christoph Rau studied that the amount of time necessary to properly annotate and analyze a genome by hand is prohibitively long. From the beginnings of genomic research, computers have been used to assist in the automated discovery and annotation of genes. However, analysis beyond this point requires the creation of novel algorithms or actual experimentation on the organism itself. As more is discovered about how genes function and more motifs are discovered, more accurate algorithms can be developed to further reduce the work a researcher has to do annotating, giving him or her more time doing the actual analysis of the genome.

Two examples useful gene classes to identify are highly expressed genes, which tend to be important genes for the organism’s survival, and ‘alien’ genes, which are genes that have been inserted into the genome at some point by some outside force. In both cases, the genes can be identified by a codon usage analysis, which involves comparing codon frequencies in genes with known expression levels with candidate genes. Depending on the candidate’s similarity to similar genes, it is categorized as highly expressed, alien or neither.

Another interesting class of molecules are small nucleolar RNAs (snoRNAs). SnoRNAs are RNA molecules that act on other RNAs to target them for modification. C-D box snoRNAs have a very specific structure (PHOTO: snoRNA), and Todd Lowe had developed a program to analyze individual genomes for potential snoRNAs. Unfortunately, the program returned large numbers of false positives. An additional algorithm was developed to refine the results. Important features are known to be conserved between different genomes. Therefore, an algorithm was designed to compare snoRNAs found in a genome to snoRNAs found in other closely related genomes. Sequences that appear in multiple genomes are considered ‘probable snoRNAs.’ These methods will assist in the annotation and discovery of new genes and snoRNA targets.