Cost-aware triage ranking algorithms for bug reporting systems

摘要

Bug triaging of deciding whom to fix the bug has been studied actively. However, existing work does not consider varying cost of the same bug over developers with diverse backgrounds and experiences. In clear contrast, we argue the “cost” of one bug can be low for one developer, while high for another. Based on this view, we study an automatic triaging system considering both accuracy and cost. Our preliminary solution, CosTriage, models user-specific experiences and estimated cost on each bug category, obtained from topic modeling, and assigns the bug to the developer who not only can, but also is expected to fix fast. For user-specific cost modeling, we are inspired by recommender system work, of estimating user-specific rating of items, e.g., movies. With this view, existing triaging work of categorizing bugs and assigning developers with experiences in the category falls into content-based recommendation (CBR). However, CBR is well known to cause overspecialization because it recommends only the types of bugs that each developer has solved before. This problem is critical because the experienced developers can become overloaded with bugs they hate to fix, though there exist other categories he can fix faster. CosTriage adopts content-boosted collaborative filtering (CBCF), considering not only similar bugs (content-based) but similar developers (collaborative) for estimating user-specific cost. In this paper, we extend to include special scenarios. First, bug may not have textual report (e.g., crash report) or textual report may lack a topic word (e.g., 1957 of 48,424 in Mozilla reports) Mozilla reports. Second, in some scenarios, developer profiles may change over time. For these scenarios, we extend CosTriage to support non-textual description and dynamic profiles, which we denote as CosTriage+. Our experimental evaluation shows that our solution reduces the cost efficiently by 30 % without seriously compromising accuracy in comparison with the baseline only considering accuracy.