An Empirical Study of Regression Bug Chains in Linux

Abstract

Regression bugs are a type of bugs that cause a feature of software that worked correctly but stop working after a certain software commit. This paper presents a systematic study of regression bug chains, an important but unexplored phenomenon of regression bugs. Our paper is based on the observation that a commit c1, which fixes a regression bug b1, may accidentally introduce another regression bug b2. Likewise, commit c2 repairing b2 may cause another regression bug b3, resulting in a bug chain, i.e., b1 → c1 → b2 → c2 → b3. We have conducted a large-scale study by collecting 1579 regression bugs and 2630 commits from 57 Linux versions (from 2.6.12 to 4.9). The relationships between regression bugs and commits are modeled as a directed bipartite network. Our major contributions and findings are fourfold: 1) a novel concept of regression bug chains and their formulation; 2) compared to an isolated regression bug, a bug on a regression bug chain is much more difficult to repair, costing 2.4× more fixing time, involving 1.3× more developers and 2.8× more comments; 3) 85.8% of bugs on the chains in Linux reside in Drivers, ACPI, Platform Specific/Hardware, and Power Management; and 4) 83% of the chains affect only a single Linux subsystem, while 68% of the chains propagate across Linux versions.

Publication
IEEE Transactions on Reliability
Date
Links