Iimi: A novel automated workflow for plant virus diagnostics from high-throughput sequencing data
dc.contributor.author | Ning, Haochen | |
dc.contributor.supervisor | Zhang, Xuekui | |
dc.date.accessioned | 2023-08-31T22:38:02Z | |
dc.date.copyright | 2023 | en_US |
dc.date.issued | 2023-08-31 | |
dc.degree.department | Department of Mathematics and Statistics | en_US |
dc.degree.level | Master of Science M.Sc. | en_US |
dc.description.abstract | Several workflows have been developed for the diagnostic testing of plant viruses using high-throughput sequencing methods. Most of these workflows require considerable expertise and input from the analyst to perform and interpret the data when deciding on a plant’s disease status. The most common detection methods use workflows based on de novo assembly and/or read mapping. Existing virus detection software mainly uses simple deterministic rules for decision-making, requiring a certain level of understanding of virology when interpreting the results. This can result in inconsistencies in data interpretation between analysts which can have serious ramifications. To combat these challenges, we developed an automated workflow using machine-learning methods, decreasing human interaction while increasing recall, precision, and consistency. Our workflow involves sequence data mapping, feature extraction, and machine learning model training. Using real data, we compared the performance of our method with other popular approaches and show our approach increases recall and precision while decreasing the detection time for most types of sequencing data. | en_US |
dc.description.embargo | 2025-08-18 | |
dc.description.scholarlevel | Graduate | en_US |
dc.identifier.uri | http://hdl.handle.net/1828/15329 | |
dc.language | English | eng |
dc.language.iso | en | en_US |
dc.rights | Available to the World Wide Web | en_US |
dc.subject | machine learning | en_US |
dc.subject | virus diagnostics | en_US |
dc.title | Iimi: A novel automated workflow for plant virus diagnostics from high-throughput sequencing data | en_US |
dc.type | Thesis | en_US |