HINDBR: Heterogeneous Information Network Based Duplicate Bug Report Prediction

Abstract

Duplicate bug reports often exist in bug tracking systems (BTSs). Almost all the existing approaches for automatically detecting duplicate bug reports are based on text similarity. A recent study found that such approaches may become ineffective in detecting duplicates in bug reports submitted after the just-in-time (JIT) retrieval, which is now a built-in feature of modern BTSs (e.g., Bugzilla). This is mainly because the embedded JIT feature suggests possible duplicates in a bug database when a bug reporter types in the new summary field, therefore minimizing the submission of textually similar reports. Although JIT filtering seems effective, a number of bug report duplicates remain undetected. Our hypothesis is that we can detect them using a semantic similarity-based approach. This paper presents HINDBR, a novel deep neural network (DNN) that accurately detects semantically similar duplicate bug reports using a heterogeneous information network (HIN). Instead of matching text similarity alone, HINDBR embeds semantic relations of bug reports into a low-dimensional embedding space where two duplicate bug reports represented by two vectors are close to each other in the latent space. Results show that HINDBR is effective.

Publication
2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)
Date
Links