Algorithms for prediction of RNA secondary structure: coronavirus pseudoknots via Shapify & CParty
Date
2024-01-30
Authors
Trinity, Luke
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
RNA molecules play a vital role in cellular processes, and many possess functional structures. Due to the complex nature of experimental methods to detect RNA structure, computational tools to predict RNA structure formation are invaluable for building comprehensive knowledge. We seek to predict RNA structure algorithmically, with a focus on the following concepts from the literature: (1) Minimum Free Energy (MFE) methods, (2) the hierarchical folding hypothesis, and (3) partition function ensemble approaches. The MFE framework is an RNA folding hypothesis stating that each RNA molecule folds into the structure with the minimum free energy. In conjunction with MFE, we employ the biologically motivated hierarchical folding hypothesis, stating that an RNA molecule will first fold once (initial fold), before a subsequent folding may occur that lowers the structure's free energy. The accuracy of MFE and hierarchical folding methods can be improved by effective incorporation of known RNA structure information such as experimental reactivity data. We introduce Shapify, an algorithm incorporating experimental data within hierarchical RNA folding prediction. Shapify receives SHAPE data as input to guide RNA structure prediction, allowing the unification of multiple experimental results to determine structure-function patterns. The time complexity of Shapify is O(N^3) time, where N is the RNA sequence length, enabling faster prediction compared with other methods that also handle a complex RNA structure class.
We then consider the partition function model, based on the MFE approach, where we compute the sum of free energies for each possible RNA structure in the ensemble at equilibrium. The likelihood of any particular RNA structure occurring can then be determined based on the energy of the structure itself relative to the total energy in the system. Currently, partition function methods are restricted to predicting a limited set of RNA structures, because existing algorithms that allow complex RNA structures are too slow, at best O(N^5) time complexity. We introduce CParty, an O(N^3) time complexity partition function algorithm that includes complex RNA structures in the ensemble. The development of CParty's recursive decomposition schemes was non-trivial to integrate within the algorithmic implementation. By providing an input structure to algorithm CParty, we compute a `conditional' partition function, enabling probabilistic calculation that advances understanding of RNA structure formation patterns.
In this dissertation, we (1) incorporate partial RNA structure information into hierarchical secondary structure prediction via Shapify to understand important secondary structure motifs affecting viral function, (2) design and implement CParty, a conditional partition function algorithm to handle complex RNA structures, and (3) apply these and other related algorithms to provide RNA structural information for COVID-19 therapeutic targets. Here, we pinpoint key secondary structure folding motifs in our quest to predict functional RNA structures. Our hierarchical folding algorithms push the frontier of prediction accuracy for functional RNA secondary structures, contributing to coronavirus treatments.
Description
Keywords
RNA structure prediction, Free energy, Partition function, Pseudoknots, RNA structure, SARS-CoV-2, Coronaviruses, Viral structure, SARS coronavirus