discussion.tex

%\emph{ WE SHOULD HAVE A SECTION EXPLAINING THE POWER OF USING MULTIPLE TRACES TO DERIVE A CONSENSUS BAND ANNOTATION-- RHIJU}

%\subsection{Use of RNA secondary structure information for band annotation}
%\emph{ I WOULD DE-EMPHASIZE THIS -- PERHAPS NOT EVN MAKE IT A SEPARATE SECTION. WHAT IF YO UHAVE AN RNA WITH UNKNWON SECONDARY STRUCTURE? OR WITH MULTIPLE SECONDARY STRUCTURES?}
%In the band annotation procedure for an RNA sequence, the proposed method constructs the band prediction matrix $\mathbf{P}$ based on the RNA's secondary structure predicted by the Vienna RNA package~\citep{hofacker2003vienna}. Although the secondary structure prediction software typically matches most experimental profiles closely, there may exist cases (\eg, complicated pseudoknots) in which the prediction quality is low or even fails. In such cases, using secondary structure information may lead to incorrect band annotation, but through preliminary filtering we can reduce the possibility of inaccurate annotation.

The proposed method for band annotation is unique in its ability to take into account all available CE profiles; prior methods (such as those available in QuShape and FAST) have focused on a single profile at a time with a reference profile if needed. The distinctive robustness of the proposed method is primarily attributed to this capability to integrate information across profiles. The method does require an accurate alignment of all profiles prior to band annotation. Our prior work \citep{Yoon2011} described a different dynamic programming algorithm to accomplish this preceding alignment based on standards co-loaded with each sample. In well over 100 data sets analyzed here, we saw only one case where inter-profile alignment was problematic (L-21 ScaI group I intron) and required manual intervention. Therefore, our alignment and annotation results herein confirm that all steps, including alignment and annotation, of RNA structure mapping CE analysis can now be routinely achieved through automated algorithms.

To flag cases with uncertain automated band annotation, we have introduced the $\escore$-score for reliability estimation. According to our experiences, given any data set for CE analysis, the band annotations with $\escore > 0.97$ are almost always reliable and can be safely adopted for final steps of band quantitation whereas the results with $\escore \le 0.97$ are less likely to reliable. Informally, we have encountered data sets in which even expert annotation is ambiguous and has required special additional experiments (such as co-loading sequencing ladders in the same color as the sample) to resolve \citep{tian2014nature}. This suggests that automated band annotation cannot improve much further; a valuable development would be reliability estimates for specific subsets of bands rather than a global number. An additional useful development would be use of known band intensities based on prior experiments or on base pair probability estimates, rather than coarse predictions for profiles based on sequence, modifier, and a single secondary structure.

The proposed algorithm has order of $NK$ time and space complexity, and the practical time demand of band annotation was resonable in our experiments. The proposed method was implemented in the MATLAB programming environment (The MathWorks, http://www.mathworks.com), and under the experimental setup used (sequential execution on a Intel core i5 4570 processor with 8-GB main memory), the total time demand of annotating bands in all the 95 data sets did not exceed 4 min (for each data set, mean 2.2837 sec; median 2.2707 sec).