Exploring Design Discussions With Semi-Supervised Topic Modelling

dc.contributor.authorLasrado, Roshan N.
dc.contributor.supervisorErnst, Neil
dc.date.accessioned2022-08-11T16:35:02Z
dc.date.available2022-08-11T16:35:02Z
dc.date.copyright2022en_US
dc.date.issued2022-08-11
dc.degree.departmentDepartment of Computer Scienceen_US
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractStack Overflow is a rich source of questions and answers—discussions—about software development. One topic of discussion is software design, such as the correct use of design patterns or best practices in data access. Since design is a more abstract topic in software engineering, researchers have long sought to characterize and model design knowledge. However, these approaches typically require significant expert input to contextualize the abstract design information. In this study, we explore how combining expert input with Stack Overflow might serve as an effective way to identify design topics. Being able to identify and classify this design knowledge would enable the discovery and sharing of this knowledge, enabling developers better leverage Stack Overflow for crowd-sourcing their design decisions. We first perform inductive coding of design-tagged Stack Overflow questions and answers to identify the design concepts that developers discuss. We report on areas where inter-rater agreement was a challenge, including abstraction levels. Since inductive coding is expensive, we apply a semi-supervised (Anchored CorEx) approach. We find that it outperforms LDA and offers superior interpretability and the ability to incorporate expert domain knowledge. We leverage Anchored CorEx to identify how design is discussed on Stack Overflow and leveraged in GitHub projects. We conclude by describing how our experience using the semi-supervised CorEx approach leads us to believe that approaches like Anchored CorEx that combine domain knowledge and scalability are key for analyzing large SE text repositories.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/14092
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectdesign discussionsen_US
dc.subjectsemi-supervised topic modellingen_US
dc.subjectdesign miningen_US
dc.titleExploring Design Discussions With Semi-Supervised Topic Modellingen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lasrado_Roshan_MSc_2022.pdf
Size:
707.14 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: