The computational prediction of molecular structure and function is critical to resolving the many unknown roles played by a growing collection of functional RNA transcripts. Our limited understanding of these molecules is an impediment to resolving questions about important biological systems such as those involving disease prevention, food security, environmental sustainability. Laboratory processes used to identify structure and function are cumbersome and expensive. Faster and cheaper computational methods that have proven effective in protein predictions have been inaccurate when applied to RNA sequences, with their tighter tolerances and more varied secondary structures. This inaccuracy may be related to the limited volume of suitable training inputs currently available for computational predictions of functional RNA. Similar inputs are often relied upon in the initial stages of comparable protein prediction algorithms and limit the search space of subsequent molecular dynamics analyses. This proposal uses commonly available experimental data from a model system of ribozymes to both contribute unique structural information to the pool of available predictive inputs and demonstrate how such data can be used to improve predictive accuracy for these molecules.
Self-cleaving ribozymes are a class of functional RNA that have properties useful to the study of RNA structure and function. Their unique combination of manageably sized sequences, well-documented structures, and easily measured functions make them an ideal model for exploring relationships between sequence mutations and their resulting structural and functional variation. These relationships are important inputs to models that predict the existence of novel RNA molecules as well as for the development of bio-engineered RNA molecules. Using extensive data sets containing the functional consequences of mutations for five different self-cleaving ribozymes, I will develop analytic approaches that add to our understanding of RNA structures and functions. First, I will show how the cleavage activity for a set of mutated sequences can be used to reveal structural information within these self-cleaving ribozymes. Next, I will use time-series data to expand the dynamic range of detectable catalytic rates for two ribozymes (hammerhead and twister), with the goal of guiding proper parameter selection to improve structural identification and functional predictions. Finally, I will employ machine learning algorithms to make accurate predictions of higher-order combinations of mutations for the CPEB3 ribozyme. The data structures and analysis approaches developed will improve future high-throughput experimental approaches aimed at understanding RNA sequence to function relationships.