Experiments of An Analytical and Experimental Comparison of CSP Extensions and Tools

This webpage is used to give more details of the experiments related to paper An Analytical and Experimental Comparison of CSP Extensions and Tools, whose technical report is available here. The experiments are comparing three model checkers FDR, ProB, and the CSP module of PAT model checker. All the source code of the experiment (CSP_M and CSP# models) in this paper is available here.

We evaluate the efficiency of FDR, ProB and PAT by verifying nine benchmark systems. All the experiments are run on a PC with Intel® CPU E6550@2.33GHz and 4GB RAM, on a 32-bit Linux OS for FDR (version 2.91) and ProB (version 1.3.5-beta1, build 10762), a 32-bit Windows OS for PAT (version 3.4.3, build 20102). They are shown as follows:

· Experiment on solving puzzles

· Experiment on shared variables

· Experiment on LTL model checking

· Experiment on battery monitor case study

This page is maintained by Ling Shi.

Please contact us if you have any question about this webpage or these experiments. We will reply your enquiries as soon as possible.

Experiment on refinement checking

The verified models in this experiment include readers/writers problem, dinning philosopher problem (DP), and Milner’s cyclic scheduler (MCS) ^[1].

Explanations of models:

· Readers/writers

This problem deals with situations in which many threads must access the same shared memory at one time, some reading and some writing, with the natural constraint that no process may access the share for reading or writing while another process is in the act of writing to it. In particular, it is allowed for two readers to access the share at the same time. In this model, a controller is used to guarantee the correct coordination among multiple readers/writers. The verified property is trace refinement (P [T= S).

· Dinning philosopher

N philosophers sit around a table. Each philosopher can eat the meal if and only if he/she picks forks on his/her right and left sides, and both folks will be released after the philosopher finishes. This model is deadlock free and the property is failure refinement (P [F= S).

· Milner’s cyclic scheduler

This scheduling algorithm is described by Milner in 1989. N processes are activated in a cyclic manner: process i activates process i + 1 for i < N - 1 and process N - 1 activates process 0. Moreover, a process may never be re-activated before it has terminated. The property is failures/divergence refinement (P [FD= S), namely, the implemented scheduler follows a cyclic way.

Experiment results:

The table below shows the verification result. Column State shows the number of visited states, and column Time(s) records running time for the verification in seconds. Value “-” in a cell denotes that the experiment is aborted due to either memory overflow or execution time exceeding two hours. Value “N.A.” means no models for the tool.

Model	N	Property	FDR		ProB		PAT
Model	N	Property	State	Time(s)	State	Time(s)	State	Time(s)
RW	6	P [T= S	8	0.024	61365	125.94	9	0.04
RW	200	P [T= S	202	1.434	-	-	203	0.11
RW	500	P [T= S	502	19.651	-	-	503	0.057
RW	1000	P [T= S	1002	156.162	-	-	1003	0.108
DP	6	P [F= S	1	0.006	14510	82.42	1762	0.174
DP	8	P [F= S	1	0.071	-	-	22362	2.995
DP	12	P [F= S	1	0.104	-	-	-	-
MCS	20	P [FD= S	40	0.043	-	-	60	0.114
MCS	50	P [FD= S	100	0.086	-	-	150	0.143
MCS	100	P [FD= S	200	0.246	-	-	300	0.53

Discussion:

For readers/writers (R/W) models, although FDR applies some dedicated compression techniques, PAT has better performance. For dining philosopher (DP) models, FDR performs extremely well because of the strategy discussed in ^[2]. However, other experiments show that this strategy may not be as efficient for other models. For MCS, PAT is comparable to FDR in terms of the number of states per second. FDR processes the LTS by applying its compression methods, whereas PAT applies a simple reduction method, i.e., using the keyword atomic to give higher priority to local events which are not synchronized, not updating any variable and not mentioned in the property.

Experiment on solving puzzles

The experiment compares the performance of three model checkers on solving puzzles, inspired by work in ^[3]. The CSP_M and CSP# models for these puzzles make the best use of their modeling power: CSP# specifies the puzzles using shared variables, which are solved by PAT through reachability analysis, whereas CSP_M models the puzzles using multi-way synchronization, which are solved by FDR and ProB through trace refinement. In addition, FDR simulates a bounded DFS algorithm by searching the divergence of a new system, in order to find a smaller counterexample. The new system P’, like a watchdog, can only perform up to N events of the target implementation process P, and then performs an infinite number of events ^[3]. This approach can be used provided that the target process P is loop-free. Column FDR-Div records the results of states and time using this algorithm; value N.A. means there is no model with divergence checking to solve the puzzle.

Explanations of models:

· Peg solitaire game

Peg solitaire game is a board game for one player involving movement of pegs on a board with holes. The rule is to jump pegs by clicking and dragging an adjoining peg to the open hole on the other side. Each time a peg is jumped, it is automatically removed. You can only remove pegs by jumping them. The best solution is that this peg is in the center.

· Knight’s tour

The Knight's tour is a mathematical problem involving a knight on a chessboard. The knight is placed on the empty board and, moving according to the rules of chess, must visit each square exactly once. A knight's tour is called a closed tour if the knight ends on a square attacking the square from which it began (so that it may tour the board again immediately with the same path). Otherwise the tour is open.

· The tower of Hanoi puzzle

This puzzle consists of three rods and a number of disks of different sizes which can slide onto any rod. The puzzle starts with the disks neatly stacked in order of size on one rod, the smallest at the top, thus making a conical shape. The puzzle is solved when disks are moved to the target rod where disks are stacked in order of size. Information of disk order on each rod is denoted by a process parameter whose value is a sequence in CSP_M and by a shared array in CSP#.

Experiment results:

Model	N	FDR		FDR-Div		ProB		PAT
Model	N	State	Time(s)	State	Time(s)	State	Time(s)	State	Time(s)
Solitaire	26	4048216	46.303	1	0.169	-	-	11950	5.356
Solitaire	29	28249254	387.737	1	0.217	-	-	104395	54.681
Solitaire	32	-	-	1	5.318	-	-	10955	5.301
Solitaire	35	-	-	1	377.297	-	-	443230	279.454
Knight	5	508450	3.522	1	0.037	-	-	4256	0.29
Knight	6	-	-	1	15.399	-	-	129269	9.143
Knight	7	-	-	1	94.713	-	-	77238	6.754
Hanoi	6	729	0.052	N.A.	N.A.	1667	57.84	5775	0.416
Hanoi	7	2187	0.086	N.A.	N.A.	4969	196.5	92680	6.837
Hanoi	8	6561	0.181	N.A.	N.A.	14853	660.59	150918	11.524

Discussion:

From the above table, we can observe that the divergence checking approach can be used in the solitaire and chess knight tour models. However, this approach cannot always significantly improve the performance, because it depends on the searching order. Moreover, it is costly to check if a system is loop-free or not, which is the premise for applying this approach. PAT solves the two puzzles in a reasonable time, and it is faster in the knight example than FDR and FDR-Div. For the Hanoi puzzle, FDR has a better performance because the compression techniques it uses can effectively reduce the state space.

Experiment on shared variables

The experiment explores the performance of FDR and PAT on verifying two models which involve shared variables, namely, concurrent stack and the Peterson algorithm.

Explanations of models:

· Concurrent stack

A concurrent stack allows multiple readers to access the shared stack at the same time, but only one writer to update the value; readers cannot access the shared variable in the latter case.

· Peterson’s algorithm

This concurrent programming algorithm is designed for mutual exclusion that allows two processes to share a single-use resource without conflicts with only shared memory for communication. The algorithm can be generalized for more than two processes, as modeled in our experiment. The property is mutual exclusion, which is specified by refinement checking in CSP_M model and reachability analysis in CSP# model. Note that we obtain the CSP_M model for the Peterson algorithm from the shared variable analyzer (SVA) ^[4], a FDR front-end which can convert programs (like C programs) with shared variables into CSP_M models. To be fair, the CSP# model is specified at the same level of granularity as the CSP_M model

Experiment results:

In the following table, ConcurrentStack*2 in the Model column means that the stack size is 2.

Model	N	Property	FDR		PAT
Model	N	Property	State	Time(s)	State	Time(s)
Concurrent Stack*2	3	P [=T S	453465	3.833	10860	1.023
Concurrent Stack*2	4	P [=T S	-	-	189920	75.915
Concurrent Stack*2	5	P [=T S	-	-	693828	293.382
Peterson	3	mutual exclusion	1011	1.192	3257	0.105
Peterson	4	mutual exclusion	105493	20.067	104686	3.776
Peterson	5	mutual exclusion	14810779	387.645	5722863	294.005

Discussion:

The results of concurrent stack example show that PAT performs better than FDR for checking trace refinement (P[T=S), and this is because PAT uses DFS with anti-chain algorithm in the trace refinement. This algorithm is effective when the specification is non-deterministic. The results of the Peterson algorithm show that PAT performs better. This is because local events associated as atomic statements in CSP# reduce the states significantly, whereas CSPM model defines additional events to represent reading/writing operations of shared variables. Although these additional events can be hidden as internal events to apply existing compression techniques in FDR, the effect is minor because the type range of reading/writing channels and operations over different variables can easily lead to state space explosion

Experiment on LTL model checking

This experiment explores the performance on verifying LTL properties. We adopt the approach proposed by Lowe ^[5] to construct a CSP_M process for the LTL formula and use FDR to perform the refusal refinement checking. As this approach cannot deal with operator eventually (⃟), we ignore the checking of property ⃞⃟eat.0 in the dining philosopher example using FDR.

Experiment results:

Model	N	Property	Result	FDR		ProB		PAT
Model	N	Property	Result	State	Time(s)	State	Time(s)	State	Time(s)
RW	6	⃞!error	true	8	0.023	122722	104.8	15	0.059
RW	200	⃞!error	true	202	1.455	-	-	403	0.086
RW	500	⃞!error	true	502	19.901	-	-	1003	0.071
RW	1000	⃞!error	true	1002	154.33	-	-	2003	0.148
DP	6	⃞⃟eat.0	false	N.A.	N.A.	2420	1.11	166	0.019
DP	8	⃞⃟eat.0	false	N.A.	N.A.	13312	1.75	256	0.024
DP	10	⃞⃟eat.0	false	N.A.	N.A.	-	-	460	0.049

Discussion:

The table indicates that PAT performs better than FDR and ProB. Notice that property ⃞⃟eat.0 can be verified to be true using PAT under the strong or global fairness assumption.

Experiment on battery monitor case study

We conduct a case study on translating CSPM model to CSP# through a real world problem, namely, the battery monitor component of the elevator control system described in ^[6]: the CSP_M specification is translated from the battery monitor Simulink diagram, and the CSP# model is translated from the CSPM model. Then we analyze the Simulink diagram, using the Simulink simulator, to determine which output the battery monitor should produce for every given input. Thus, to assess if both models describe the same behaviour, we compose each one with parallel observers. We noticed that both CSP_M and CSP# models provide the same output value for all relevant scenarios. Besides this analysis, we also checked basic properties like deadlock freeness and divergence freeness.

Explanations of models:

This model specifies a battery monitor module in power monitor control systems designed as a Simulink diagram. The Simulink system specification is a collection of 7 blocks. Each block is free to perform its exclusive events, and must synchronize with common events. In each period, when a sampleStep event is communicated, the block starts to read its inputs in any order. When all inputs are read, it outputs a value based on its function until another sampleStep event is communicated. When a sampleStep event does not happen, then it outputs a value considering the last read inputs.

Experiment results:

The comparison is performed as a controlled experiment and we run each assertion 30 times. We apply common statistics testing methods (particularly, Shapiro Wilk and Mann-Whitney U) to experiment data.

Model	Result	FDR		PAT
Model	Result	State	Time(s)	State	Time(s)
Deadlock-free	true	2700	0.276	2700	0.748
Livelock-free	true	2700	0.286	2700	0.466

Discussion:

From the table, we can observe that the visited states in FDR and PAT are the same, and performance of these two tools is similar using the same verification algorithm. Note that the time for deadlock-freeness property in PAT consists of time for deadlock and nonterminating checking. We have not included ProB in this comparison as it is unable to recognize the CSP_M syntax for this example.

References

[1] R. Milner. Communication and concurrency. Prentice-Hall, Inc., 1989.

[2] A. W. Roscoe, P. H. B. Gardiner, M. Goldsmith, J. R. Hulance, D. M. Jackson, and J. B. Scattergood. Hierarchical compression for model-checking csp or how to check 10²⁰ dining philosophers for deadlock. In TACAS, pages 133–152, 1995.

[3] H. Palikareva, J. Ouaknine, and A. W. Roscoe. Faster fdr counterexample generation using sat-solving. Electronic Communications of the EASST, 23, September 2009.

[4] A. W. Roscoe. Understanding Concurrent Systems. Springer-Verlag New York, Inc., 2010.

[5] G. Lowe. Specification of communicating processes: temporal logic versus refusals-based refinement. Form. Asp. Comput., 20(3):277–294, May 2008.

[6] J. Jesus, A. Mota, A. Sampaio, and L. Grijo. Architectural verification of control systems using csp. In ICFEM, pages 323–339, 2011.