GitHub Issue Label Clustering by Weighted Overlap Coefficient

dc.contributor.authorLi, Yunlong
dc.contributor.supervisorDamian, Daniela
dc.date.accessioned2017-05-01T17:34:42Z
dc.date.available2017-05-01T17:34:42Z
dc.date.copyright2017en_US
dc.date.issued2017-05-01
dc.degree.departmentDepartment of Computer Science
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractGitHub labels are designed for helping people to classify and recognize different issues. When naming a label, people may use different word formats (e.g., bug, Bug, bugs, etc.) to express the same meaning. Therefore, managing the issue labels in GitHub becomes a challenging task. Clustering the morphological synonym labels will make it easier for management of the issues and complete some data preprocessing work for the automatic labeling research. String similarity calculation is the key part of the clustering algorithm. In this project, a weighted overlap coefficient method is proposed as a string similarity measure for clustering the labels. The most frequently used 200 labels are selected as the experiment data for analysis. The preliminary working results show that the new method does improve the original overlap coefficient by producing a 4.43% higher F-Measure and 92.42% of all the experiment labels have been correctly clustered.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/8039
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectGitHub issue labelen_US
dc.subjectstring similarity metricen_US
dc.subjectoverlap coefficienten_US
dc.subjectclusteringen_US
dc.titleGitHub Issue Label Clustering by Weighted Overlap Coefficienten_US
dc.typeprojecten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Li_Yunlong_MSc_2017.pdf
Size:
806.75 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.74 KB
Format:
Item-specific license agreed upon to submission
Description: