Learning COVID-19 network from literature databases using core decomposition

dc.contributor.authorGuo, Yang
dc.contributor.supervisorZhang, Xuekui
dc.contributor.supervisorXing, Li
dc.date.accessioned2021-07-22T19:46:03Z
dc.date.available2021-07-22T19:46:03Z
dc.date.copyright2021en_US
dc.date.issued2021-07-22
dc.degree.departmentDepartment of Mathematics and Statistics
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractThe SARS-CoV-2 coronavirus is responsible for millions of deaths around the world. To help contribute to the understanding of crucial knowledge and to further generate new hypotheses relevant to SARS-CoV-2 and human protein interactions, we make use of the information abundant Biomine probabilistic database and extend the experimentally identified SARS-CoV-2-human protein-protein interaction (PPI) network in silico. We generate an extended network by integrating information from the Biomine database and the PPI network. To generate novel hypotheses, we focus on the high-connectivity sub-communities that overlap most with the PPI network in the extended network. Therefore, we propose a new data analysis pipeline that can efficiently compute core decomposition on the extended network and identify dense subgraphs. We then evaluate the identified dense subgraph and the generated hypotheses in three contexts: literature validation for uncovered virus targeting genes and proteins, gene function enrichment analysis on subgraphs, and literature support on drug repurposing for identified tissues and diseases related to COVID-19. The majority types of the generated hypotheses are proteins with their encoding genes and we rank them by sorting their connections to known PPI network nodes. In addition, we compile a comprehensive list of novel genes, and proteins potentially related to COVID-19, as well as novel diseases which might be comorbidities. Together with the generated hypotheses, our results provide novel knowledge relevant to COVID-19 for further validation.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/13166
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectCOVID-19en_US
dc.subjectCore Decompositionen_US
dc.subjectNetwork Inferenceen_US
dc.subjectData Miningen_US
dc.subjectGraph Theoryen_US
dc.titleLearning COVID-19 network from literature databases using core decompositionen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Guo_Yang_MSc_2021.pdf
Size:
7.34 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: