Mining GitHub Issues for Bugs, Feature Requests and Questions




Jokhio, Marvi

Journal Title

Journal ISSN

Volume Title



The maintenance and success of software projects highly depend on updated and bug-free code. To effectively process hundreds of daily new issues in big software projects, tools like issue tracking systems (ITS) play an important role but the critical aspect for issue processing and triaging needs assignment of accurate labels to determine their type (e.g., bug, feature, question and so on). This labelling is a time-consuming and tedious task and hence needs automated solutions. Automatic classification of issues is a challenging task due to semantically ambiguous text which contains code, links, package and method names, commands etc. In this work, we propose supervised and unsupervised mining techniques for GitHub issues using text only. In the supervised machine learning technique, we show that our model can classify issues in the bug, feature, and question classes with 86.7% AUC scores. We also proposed a technique to extract topics from GitHub issues using Latent Dirichlet Allocation (LDA) to analyze the type of development issues faced by developers.



GitHub Issues, Topic Modeling, Machine Learning, Software Bugs, Software Feature Requests, Text Mining