Evaluation of network inference algorithms and their effects on network analysis for the study of small metabolomic data sets

Date

2022-05-24

Authors

Greenyer, Haley

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Motivation: Alzheimer’s Disease (AD) is a highly prevalent, neurodegenerative disease which causes gradual cognitive decline. As documented in the literature, evi- dence has recently mounted for the role of metabolic dysfunction in AD. Metabolomic data has therefore been increasingly used in AD studies. Metabolomic disease studies often suffer from small sample sizes and inflated false discovery rates. It is therefore of great importance to identify algorithms best suited for the inference of metabolic networks from small cohort disease studies. For future benchmarking, and for the development of new metabolic network inference methods, it is similarly important to identify appropriate performance measures for small sample sizes. Results: The performances of 13 different network inference algorithms, includ- ing correlation-based, regression-based, information theoretic, and hybrid methods, were assessed through benchmarking and structural network analyses. Benchmark- ing was performed on simulated data with known structures across six sample sizes using three different summative performance measures: area under the Receiver Op- erating Characteristic Curve, area under the Precision Recall Curve, and Matthews Correlation Coefficient. Structural analyses (commonly applied in disease studies), including betweenness, closeness, and eigenvector centrality were applied to simu- lated data. Differential network analysis was additionally applied to experimental AD data. Based on the performance measure benchmarking and network analysis results, I identified Probabilistic Context Likelihood Relatedness of Correlation with Biweight Midcorrelation (PCLRCb) (a novel variation of the PCLRC algorithm) to be best suited for the prediction of metabolic networks from small-cohort disease studies. Additionally, I identified Matthews Correlation Coefficient as the best mea- sure with which to evaluate the performance of metabolic network inference methods across small sample sizes.

Description

Keywords

Alzheimer's, Metabolomics, Differential Network Analysis, sample size, network inference, mouse model

Citation