Positive unlabeled learning applications in music and healthcare

dc.contributor.authorArjannikov, Tom
dc.contributor.supervisorTzanetakis, George
dc.contributor.supervisorYi, Kwang Moo
dc.date.accessioned2021-09-10T23:21:26Z
dc.date.available2021-09-10T23:21:26Z
dc.date.copyright2021en_US
dc.date.issued2021-09-10
dc.degree.departmentDepartment of Computer Science
dc.degree.levelDoctor of Philosophy Ph.D.en_US
dc.description.abstractThe supervised and semi-supervised machine learning paradigms hinge on the idea that the training data is labeled. The label quality is often brought into question, and problems related to noisy, inaccurate, or missing labels are studied. One of these is an interesting and prevalent problem in the semi-supervised classification area where only some positive labels are known. At the same time, the remaining and often the majority of the available data is unlabeled, i.e., there are no negative examples. Known as Positive-Unlabeled (PU) learning, this problem has been identified with increasing frequency across many disciplines, including but not limited to health science, biology, bioinformatics, geoscience, physics, business, and politics. Also, there are several closely related machine learning problems, such as cost-sensitive learning and mixture proportion estimation. This dissertation explores the PU learning problem from the perspective of density estimation and proposes a new modular method compatible with the relabeling framework that is common in PU learning literature. This approach is compared with two existing algorithms throughout the manuscript, one from a seminal work by Elkan and Noto and a current state-of-the-art algorithm by Ivanov. Furthermore, this thesis identifies two machine learning application domains that can benefit from PU learning approaches, which were not previously seen that way: predicting length of stay in hospitals and automatic music tagging. Experimental results with multiple synthetic and real-world datasets from different application domains validate the proposed approach. Accurately predicting the in-hospital length of stay (LOS) at the time of admission can positively impact healthcare metrics, particularly in novel response scenarios such as the Covid-19 pandemic. During the regular steady-state operation, traditional classification algorithms can be used for this purpose to inform planning and resource management. However, when there are sudden changes to the admission and patient statistics, such as during the onset of a pandemic, these approaches break down because reliable training data becomes available only gradually over time. This thesis demonstrates the effectiveness of PU learning approaches in such situations through experiments by simulating the positive-unlabeled scenario using two fully-labeled publicly available LOS datasets. Music auto-tagging systems are typically trained using tag labels provided by human listeners. In many cases, this labeling is weak, which means that the provided tags are valid for the associated tracks, but there can be tracks for which a tag would be valid but not present. This situation is analogous to PU learning with the additional complication of being a multi-label scenario. Experimental results on publicly available music datasets with tags representing three different labeling paradigms demonstrate the effectiveness of PU learning techniques in recovering the missing labels and improving auto-tagger performance.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/13376
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectmachine learningen_US
dc.subjectclassificationen_US
dc.subjectsemi-supervised learningen_US
dc.subjectpositive-unlabeled learningen_US
dc.subjecthealthcareen_US
dc.subjectlength of stayen_US
dc.subjectmusic information retrievalen_US
dc.subjectauto-taggingen_US
dc.titlePositive unlabeled learning applications in music and healthcareen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Arjannikov_Tom_PhD_2021.pdf
Size:
2.27 MB
Format:
Adobe Portable Document Format
Description:
Dissertation
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: