Deep learning for promoter recognition: a robust testing methodology

dc.contributor.authorPerez Martell, Raul Ivan
dc.contributor.supervisorStege, Ulrike
dc.date.accessioned2020-04-30T04:09:33Z
dc.date.available2020-04-30T04:09:33Z
dc.date.copyright2020en_US
dc.date.issued2020-04-29
dc.degree.departmentDepartment of Computer Scienceen_US
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractUnderstanding DNA sequences has been an ongoing endeavour within bioinfor- matics research. Recognizing the functionality of DNA sequences is a non-trivial and complex task that can bring insights into understanding DNA. In this thesis, we study deep learning models for recognizing gene regulating regions of DNA, more specifi- cally promoters. We first consider DNA modelling as a language by training natural language processing models to recognize promoters. Afterwards, we delve into current models from the literature to learn how they achieve their results. Previous works have focused on limited curated datasets to both train and evaluate their models using cross-validation, obtaining high-performing results across a variety of metrics. We implement and compare three models from the literature against each other, us- ing their datasets interchangeably throughout the comparison tests. This highlights shortcomings within the training and testing datasets for these models, prompting us to create a robust promoter recognition testing dataset and developing a testing methodology, that creates a wide variety of testing datasets for promoter recognition. We then, test the models from the literature with the newly created datasets and highlight considerations to take in choosing a training dataset. To help others avoid such issues in the future, we open-source our findings and testing methodology.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/11701
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectTesting Methodologyen_US
dc.subjectDeep Learningen_US
dc.subjectMachine Learningen_US
dc.subjectPromoter Recognitionen_US
dc.titleDeep learning for promoter recognition: a robust testing methodologyen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Perez_Ivan_MASc_2020.pdf
Size:
8.11 MB
Format:
Adobe Portable Document Format
Description:
Thesis document
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: