CoREBench

CoREBench: Realisitic, complex regression errors

Why do we need it?

In software engineering research, we require benchmarks in order to evaluate and compare novel regression testing, debugging and repair techniques, yet actual regression errors seem unavailable. The most popular error benchmarks, Siemens and SIR, contain mostly manually seeded regression errors. Developers were asked to change the given programs slightly such that they contain errors of varying detectability, i.e., that were more or less difficult to expose. However, in our recent study we find that such seeded regression errors are significantly less complex than actual regression errors, i.e., they require significantly less substantial fixes than actual regression errors. This poses a threat to the validity of studies based on seeded error! We propose CoREBench for the study of complex regression errors.

[Read the Paper] [Download CoREBench] [How to Cite]

What is it?

CoREBench is a collection of 70 realistically Complex Regression Errors that were systematically extracted from the repositories and bug reports of four open-source software projects: Make, Grep, Findutils, and Coreutils. These projects are well-tested, well-maintained, and widely-deployed open source programs for which the complete version history and all bug reports can be publicly accessed. The test suites for all four projects provide about 8,000 test cases while further test cases can be generated as text input (command line) for the standardized and specified program interfaces. For each regression error, we determined the commit that introduced the error (

), the commit that fixed it (

), and a validating test case that passes (

) on the versions before the error was introduced and after the error was fixed but fails (

) in between.

What do we know about it?

Through a systematic analysis of 4 x 1000 recent source code commits, we have identified and validated 70 regression errors (incl. six segmentation faults) that were introduced by 57 different commits. Once introduced, 12% of the errors are fixed within a week while half stay undetected and uncorrected for more than nine months up to 8.5 years. Eleven errors were fixed incorrectly. In these cases the error was indeed removed in the fixed version. Yet, up to three new errors were introduced that required further fixes. About one third of the errors were introduced by changes not to the program's behavior but to non-functional properties such as performance, memory consumption, or APIs. In some cases one error would supercede another error such that the latter was not observable for the duration that the former remained unfixed. For instance, find.66c536bb supercedes find.dbcb10e9.

Using CoREBench to study Error Complexity

Using CoREBench and our novel error complexity metric, we can answer refined research questions, such as:

What is the root-cause of a complex error? If an error requires a substantial fix, can we assume that there is just one faulty statement causing the error? Are faults of complex errors localizable? The answers may have implications for the performance of (statistical) debugging techniques.
Test suite adequacy to expose complex errors? Some widely used metrics of test suite adequacy, such as statement or branch coverage, are based on the implicit assumption that errors are often simple, i.e., that the fault is localizable within some branch or statement which is covered. Now we may be able to investigate the effectiveness of coverage-adequate test suites w.r.t. a varying degree of error complexity and may develop more sophisticated adequacy-criteria that account for complex errors. Moreover, for the study of the relationship between simple and complex errors (i.e., the coupling effect), we can take error complexity as an ordinal rather than a dichotomous measure.
How do we repair complex errors? By definition, the fix of complex errors is more substantial than for simple errors. The research community has made significant progress understanding the automated repair of (simple) localizable errors. Now we may be able to evaluate the efficiency of such repair techniques w.r.t. a varying degree of complexity of the repaired errors.

Complex Regression Errors in CoREBench

Note that the version BEFORE the regression-introducing commit does not contain the error while the version AFTER the regression-introducing commit and BEFORE the regression-fixing commit does contain the error.

                        coreutils (txt)
Error-IDError-Introd. CommitError-Fixing CommitError Complexity Bug Report

(1)
core.5ee7d8f5
0928c241
5ee7d8f5
2
http://bugs.gnu.org/9308

(2)
core.51a8f707
20c0b870
51a8f707
0
http://bugs.gnu.org/14530

(3)
core.b54b47f9
3e466ad0
b54b47f9
2
http://bugs.gnu.org/13127#61

(4)
core.8f976798
d461bfd2
8f976798
18
http://lists.gnu.org/.../2013-04/msg00003.html

(5)
core.d461bfd2
ae494d4b
d461bfd2
13
http://lists.gnu.org/.../2013-04/msg00003.html

(6)
core.be7932e8
ec48bead
be7932e8
6
http://bugs.gnu.org/13627

(7)
core.2238ab57
77f89d01
2238ab57
10
http://bugs.gnu.org/13525

(8)
core.76f606a9
3786fb6d
76f606a9
3
http://bugs.gnu.org/13227

(9)
core.64d4a280
6c5f11fb
64d4a280
4
http://bugs.gnu.org/12959

(10)
core.62543570
24ebca61
62543570
2
http://bugs.gnu.org/13119

(11)
core.ec48bead
2e636af1
ec48bead
6
http://bugs.gnu.org/13098

(12)
core.06aeeecb
7380cf79
06aeeecb
7
http://bugs.gnu.org/12966

(13)
core.a04ddb8d
84457c49
a04ddb8d
7
http://bugs.gnu.org/12445

(14)
core.6124a384
bcb9078e
6124a384
2
http://bugs.gnu.org/11453

(15)
core.f7f398a1
cfe1040c
f7f398a1
3
http://bugs.gnu.org/10967

(16)
core.61de57cd
ae494d4b
61de57cd
7
http://bugs.gnu.org/8544

(17)
core.2e636af1
7380cf79
2e636af1
2
http://bugs.gnu.org/7993

(18)
core.b8108fd2
6c5f11fb
b8108fd2
2
http://lists.gnu.org/.../2008-02/msg00187.html

(19)
core.a860ca32
86e4b778
a860ca32
16
http://lists.gnu.org/.../2007-10/msg00237.html

(20)
core.86e4b778
6c5f11fb
86e4b778
19
http://lists.gnu.org/.../2007-07/msg00055.html

(21)
core.a6a447fc
ae571715
a6a447fc
7
http://lists.gnu.org/.../2007-05/msg00195.html

(22)
core.6fc0ccf7
7eff5901
6fc0ccf7
1
http://lists.gnu.org/.../2005-05/msg00189.html



findutils (txt)

(23)
find.183115d0
e6680237
183115d0
20       
http://savannah.gnu.org/bugs/?34976

(24)
find.e6680237
7dc70069
e6680237
1
http://savannah.gnu.org/bugs/?29949

(25)
find.e1d0a991
f0759ab8
e1d0a991
39
http://savannah.gnu.org/bugs/?27563

(26)
find.93623752
e8bd5a2c
93623752
47
http://savannah.gnu.org/bugs/?28824

(27)
find.b445af98
f4d8c73d
b445af98
12
http://savannah.gnu.org/bugs/?25359

(28)
find.c8491c11
864b25ed
c8491c11
8
http://savannah.gnu.org/bugs/?24169

(29)
find.f7197f3a
71f10368
f7197f3a
2
http://savannah.gnu.org/bugs/?23663

(30)
find.dbcb10e9
daf7f100
dbcb10e9
1
http://savannah.gnu.org/bugs/?20139

(31)
find.66c536bb
e8bd5a2c
66c536bb
6
http://savannah.gnu.org/bugs/?20005

(32)
find.24bf33c0
b46b0d89
24bf33c0
4
http://savannah.gnu.org/bugs/?19605

(33)
find.091557f6
b46b0d89
091557f6
5
http://savannah.gnu.org/bugs/?19613

(34)
find.07b941b1
84aef0ea
07b941b1
1
http://savannah.gnu.org/bugs/?17490

(35)
find.24e2271e
f0759ab8
24e2271e
2
http://savannah.gnu.org/bugs/?18222

(36)
find.ff248a20
2d428f84
ff248a20
35
http://savannah.gnu.org/bugs/?13381

(37)
find.6e4cecb6
85653349
6e4cecb6
32
http://savannah.gnu.org/bugs/?12181


grep (txt)

(38)
grep.c96b0f2c
c32c0421
c96b0f2c
3        
http://lists.gnu.org/.../2012-08/msg00012.html

(39)
grep.2be0c659
70e23616
2be0c659
9
http://lists.gnu.org/.../2012-06/msg00001.html

(40)
grep.074842d3
7aa698d3
074842d3
7
http://lists.gnu.org/.../2010-12/msg00009.html

(41)
grep.7aa698d3
70e23616
7aa698d3
40
http://lists.gnu.org/.../2012-06/msg00001.html

(42)
grep.3c3bdace
62458291
3c3bdace
4
http://savannah.gnu.org/bugs/?33547

(43)
grep.c1cb19fe
0d38a8bb
c1cb19fe
3
http://lists.gnu.org/.../2010-03/msg00449.html

(44)
grep.8f08d8e2
8a025cf8
8f08d8e2
2
http://lists.gnu.org/.../2010-03/msg00443.html

(45)
grep.3220317a
8f9106c4
3220317a
2
http://lists.gnu.org/.../2010-03/msg00395.html

(46)
grep.55cf7b6a
75bc6fb1
55cf7b6a
7
http://bugs.debian.org/...bug=669084

(47)
grep.9c45c193
c4e8205b
9c45c193
7
http://savannah.gnu.org/bugs/?29876

(48)
grep.58195fab
bf3bd92c
58195fab
29
http://lists.gnu.org/.../2012-04/msg00056.html

(49)
grep.5fa8c7c9
6d952bee
5fa8c7c9
4
http://lists.gnu.org/.../2010-03/msg00586.html

(50)
grep.6d952bee
db9d6340
6d952bee
4
http://lists.gnu.org/.../2010-03/msg00533.html

(51)
grep.db9d6340
deccad69
db9d6340
5
http://lists.gnu.org/.../2010-03/msg00579.html

(52)
grep.54d55bba
401d8194
54d55bba
2
http://lists.gnu.org/.../2010-03/msg00477.html


make (txt)

(53)
make.fd30db12
0afbbf85
fd30db12
8        
http://savannah.gnu.org/bugs/?31155

(54)
make.0a81d50d
036760a9
0a81d50d
2
http://savannah.gnu.org/bugs/?39203

(55)
make.40a49f24
0afbbf85
40a49f24
26
http://savannah.gnu.org/bugs/?39310

(56)
make.686a74bf
d4ee0012
686a74bf
1
http://savannah.gnu.org/bugs/?38051

(57)
make.fbe5b2c9
19b6504f
fbe5b2c9
2
http://savannah.gnu.org/bugs/?30653

(58)
make.88f1bc8b
391456aa
88f1bc8b
7
http://savannah.gnu.org/bugs/?16670

(59)
make.3b1432d8
c7b469f0
3b1432d8
8
http://savannah.gnu.org/bugs/?30748

(60)
make.5acda13a
8f30b688
5acda13a
28
http://savannah.gnu.org/bugs/?30612

(61)
make.036760a9
d4ee0012
036760a9
1
http://savannah.gnu.org/bugs/?30723

(62)
make.fc644b4c
8f30b688
fc644b4c
36
http://savannah.gnu.org/bugs/?28525

(63)
make.d65b267e
0afbbf85
d65b267e
377


(64)
make.0afbbf85
659fc6b5
0afbbf85
370
http://savannah.gnu.org/bugs/?18622

(65)
make.22886f8a
cb2f2002
22886f8a
97
http://savannah.gnu.org/bugs/?16053

(66)
make.0e6c4f5b
2860d3b2
0e6c4f5b
2
http://savannah.gnu.org/bugs/?13022

(67)
make.e4372262
cb2f2002
e4372262
4
http://savannah.gnu.org/bugs/?13881

(68)
make.4923580e
659fc6b5
4923580e
21
http://savannah.gnu.org/bugs/?12320

(69)
make.d584d0c1
73e7767f
d584d0c1
9
http://savannah.gnu.org/bugs/?12267

(70)
make.2860d3b2
73e7767f
2860d3b2
5
http://savannah.gnu.org/bugs/?12202

Download and Install CoREBench

* The Dockerfile, test scripts, and installation scripts are available on Github.
* Download and install Docker.

git clone https://github.com/mboehme/corebench.git

`docker pull mboehme/corebench`
`docker pull mboehme/corebenchx`

* Execute ./run.sh to start the Docker container with a shared directory /shared.
* Connect to the docker container
** Desktop. Install VNC and connect to <docker-ip>:5900 (password: corebench)
** Terminal. Execute ./run corebench to open container for find, grep, and make and ./run corebenchx for coreutils.
* Find scripts in directory /root/corebench and repository in /root/corerepo
* Implement analysis.sh as your analysis script

* Run ./executeTests.sh -test-all [core|find|grep|make|all] /root/corerepo
     to execute for each error the test case,
     1) on the version BEFORE the regression-INTRODUCING commit (should PASS)
     2) on the version AFTER the regression-INTRODUCING commit (should FAIL)
     3) on the version BEFORE the regression-FIXING commit (should FAIL)
     4) on the version AFTER the regression-FIXING commit (should PASS)

Note: Use the folder /shared for scripts and other data you would like to maintain. All other data is lost in the event that the container is shut down. For instance, you can copy the folder /root/corebench to /shared, modify analysis.sh and execute ./executeTests.sh from /shared.

Download more Test Cases for CoREBench

* For debugging and repair, a much larger number of (auto-generated) test cases with oracles may be necessary.
* Such test cases are available on Github.

git clone https://github.com/thierry-tct/Tests_CPA_ICSE.git

* Credits: Titcheu Chekam Thierry, Mike Papadakis, Yves Le Traon and Mark Harman
* The authors note that the test case oracles were added manually, and the test cases are written for the patched versions.
* Read how these test cases were generated:
"Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation that Avoids the Unreliable Clean Program Assumption", Titcheu Chekam Thierry, Mike Papadakis, Yves Le Traon and Mark Harman, in the Proceedings of the 39th International Conference on Software Engineering (ICSE'17), 2017.

Download Cyclomatic Change Complexity Tool

* Download: cycc.tar.gz:
Requires Linux environment, Git, OCaml, and CIL (see README.cycc).
* Run make
* Run ./csgconstruct --git <git-directory> --diff <diff-tool>
This will compute the Cyclomatic Change Complexity (CyCC) from the recent code commit in <git-directory>.

National University of Singapore

CoREBench: Realisitic, complex regression errors

Why do we need it?

What is it?

What do we know about it?

Using CoREBench to study Error Complexity

Complex Regression Errors in CoREBench

coreutils (txt)

findutils (txt)

grep (txt)

make (txt)