Techniques for analyzing high throughput molecular biology data




Lu, Linghong

Journal Title

Journal ISSN

Volume Title



The application of ultrahigh-field Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) technology to identify and quantify metabolomics data is relatively new. An important feature of the FTICR-MS metabolomics data is the high percentage of missing values. In this thesis, missing value analysis showed that the missing value percentages were up to 50% and the control treatment, NaOH.ww, had the highest missing value percentage among the treatments in the aqueous FTICRMS sets. A simulation study was done for the FTICR-MS data to compare selection methods, the Kruskal-Wallis test and the MTP and Limma functions in Bioconductor, an open source project to facilitate the analysis of high-throughput data. The study showed that MTP was sensitive to variations among treatments, while the Kruskal- Wallis test was relatively conservative in detecting variations. As a result, MTP had a much higher false positive rate than Kruskal-Wallis test. The performance of Limma for sensitivity and false positive rate was between the Kruskal-Wallis test and MTP. Data sets with missing values were also simulated to assess the performance of imputation methods. Study showed that variances among treatments diminished or disappeared after imputations, but no new differentially expressed masses were created. This gave us confidence in using imputation methods. Summary of analysis results of some of the frogSCOPE data sets was given in the last chapter as an illustration.



high throughput biology data, statistical analysis, differential expression