Mining GitHub Issues for Bugs, Feature Requests and Questions
Date
2021-12-14
Authors
Jokhio, Marvi
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The maintenance and success of software projects highly depend on updated and bug-free code. To effectively process hundreds of daily new issues in big software projects, tools like issue tracking systems (ITS) play an important role but the critical aspect for issue processing and triaging needs assignment of accurate labels to determine their type (e.g., bug, feature, question and so on). This labelling is a time-consuming and tedious task and hence needs automated solutions. Automatic classification of issues is a challenging task due to semantically ambiguous text which contains code, links, package and method names, commands etc.
In this work, we propose supervised and unsupervised mining techniques for GitHub issues using text only. In the supervised machine learning technique, we show that our model can classify issues in the bug, feature, and question classes with 86.7% AUC scores. We also proposed a technique to extract topics from GitHub issues using Latent Dirichlet Allocation (LDA) to analyze the type of development issues faced by developers.
Description
Keywords
GitHub Issues, Topic Modeling, Machine Learning, Software Bugs, Software Feature Requests, Text Mining