Conclusion stability for natural language based mining of design discussions

dc.contributor.authorMahadi, Alvi
dc.contributor.supervisorErnst, Neil A.
dc.date.accessioned2021-02-12T05:11:37Z
dc.date.available2021-02-12T05:11:37Z
dc.date.copyright2021en_US
dc.date.issued2021-02-11
dc.degree.departmentDepartment of Computer Science
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractDeveloper discussions range from in-person hallway chats to comment chains on bug reports. Being able to identify discussions that touch on software design would be helpful in documentation and refactoring software. Design mining is the application of machine learning techniques to correctly label a given discussion artifact, such as a pull request, as pertaining (or not) to design. In this work we demonstrate a simple example of how design mining works. We first replicate an existing state-of-the-art design mining study to show how conclusion stability is poor on different artifact types and different projects. Then we introduce two techniques—augmentation and context specificity—that greatly improve the conclusion stability and cross-project relevance of design mining. Our new approach achieves AUC-ROC of 0.88 on within dataset classification and 0.84 on the cross-dataset classification task.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.bibliographicCitationA. Mahadi, K. Tongay and N. A. Ernst, "Cross-Dataset Design Discussion Mining," 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), London, ON, Canada, 2020, pp. 149-160. doi: 10.1109/SANER48275.2020.9054792en_US
dc.identifier.urihttp://hdl.handle.net/1828/12672
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectdesign miningen_US
dc.subjectaugmentationen_US
dc.subjectcontext specificityen_US
dc.titleConclusion stability for natural language based mining of design discussionsen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mahadi_Alvi_MSc_2021.pdf
Size:
1.71 MB
Format:
Adobe Portable Document Format
Description:
Main thesis file
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: