GitHub Issue Label Clustering by Weighted Overlap Coefficient
| dc.contributor.author | Li, Yunlong | |
| dc.contributor.supervisor | Damian, Daniela | |
| dc.date.accessioned | 2017-05-01T17:34:42Z | |
| dc.date.available | 2017-05-01T17:34:42Z | |
| dc.date.copyright | 2017 | en_US |
| dc.date.issued | 2017-05-01 | |
| dc.degree.department | Department of Computer Science | |
| dc.degree.level | Master of Science M.Sc. | en_US |
| dc.description.abstract | GitHub labels are designed for helping people to classify and recognize different issues. When naming a label, people may use different word formats (e.g., bug, Bug, bugs, etc.) to express the same meaning. Therefore, managing the issue labels in GitHub becomes a challenging task. Clustering the morphological synonym labels will make it easier for management of the issues and complete some data preprocessing work for the automatic labeling research. String similarity calculation is the key part of the clustering algorithm. In this project, a weighted overlap coefficient method is proposed as a string similarity measure for clustering the labels. The most frequently used 200 labels are selected as the experiment data for analysis. The preliminary working results show that the new method does improve the original overlap coefficient by producing a 4.43% higher F-Measure and 92.42% of all the experiment labels have been correctly clustered. | en_US |
| dc.description.scholarlevel | Graduate | en_US |
| dc.identifier.uri | http://hdl.handle.net/1828/8039 | |
| dc.language.iso | en | en_US |
| dc.rights | Available to the World Wide Web | en_US |
| dc.subject | GitHub issue label | en_US |
| dc.subject | string similarity metric | en_US |
| dc.subject | overlap coefficient | en_US |
| dc.subject | clustering | en_US |
| dc.title | GitHub Issue Label Clustering by Weighted Overlap Coefficient | en_US |
| dc.type | project | en_US |