CloneDifferentiator is a semantic differencing tool for software clones, which complements clone detection with program differencing for the purpose of characterizing clones. It captures semantic information of clones from Program Dependent Graghs (PDGs), and uses graph matching techniques to compute a matched- and a unmatched- set of PDG nodes. Based on the matching results, a precise characterization of clones in terms of a category of semantic differences is conducted.
We have integrated the CloneDifferentiator with our previous clone detection and visualization tool Clone Analyzer. Once the particulars about clones are provided by clone detection tool, we adopt intra-method PDG to capture semantic information of clones. A PDG is an intermediate program model that encodes both the data and control dependences between program statements. Given a pair/class of clones detected by Clone Analyzer (actually not limited to Clone Analyzer, any clone detection tool will work), we use Wala, a static analysis library for Java byte-code, to generate the PDG of the method.
For the optimization and abstraction techniques working for cloning domain, CloneDifferentiator configures GenericDiff to compare those generated PDGs from the clone instances. GenericDiff is a general framework for model comparison. Given two input models, GenericDiff casts the problem of comparing two models as the problem of recognizing the Maximum Common Subgraph of two Typed Attributed Graphs (TAGs). Given two PDGs, PDG1 and PDG2, GenericDiff parses the input PDGs into typed attributed graphs, TAG1 and TAG2, consisting of graph nodes whose type attribute represents the type of the corresponding SSA statements, and consisting of graph edges whose type attribute represents either control or data dependence.
To better produce the readable and accurate clone analysis report, we define the node/property change for nodes in PDGs. Based on the different types of node/property change, we summarize the patterns of semantic differences of comparison results include differential properties, additional branch(es), partially matched branch(es), additional operation(s), unmatched operation pair(s), additional block(s) and unmatched block pair(s).
1. Window Platform with .Net Framework 3.5 or above installed.
2. MySQL server 5.0 or above install and the MySQL ODBC Connector.
3. The Eclipse platform 3.6 or above with AspectJ Visualizer Plugin 1.6 (Part of AJDT) or above
You can downloading the zipped source code, and run the plugin according to the instructions of the user manual. Currently, we release the source code version of CloneDifferentiator . Later, we will provide the runnable Jar plugin version.
Here is a example from JavaIO, which we presented in the poster in International Workshop on Software Clones 2011.
There are two way to conduct the clone detection and comparison. First, one can open the perspective “Clone Analyzer”(red marker 1) to activate the configuration wizard for Clone Analyzer and customize the parameters like minimum token length for clone detection. Second, we can right-click on the Java Project to pop up a context menu in the eclipse package explore view. Just click the menu item “Clone Detect and SemanticDiff”(red marker 2), you can get the clone detection results to conduct the further clone comparison. BTW, you can right-click on the single/multiple Jave “src” fold(s), Jave source package(s), or Jave source file(s) in one Java Project to start the clone detection. For example, if we just want find and compare the clones in Class ObjectInputStream and Class ObjectOutputStream, we can just select these two files in package view, right-click to pop up the context menu and press the menu item "Clone Detect and SemanticDiff". The detection result will be presented in a table view, and the user can select the corresponding clone class/clone pair to compare.

Suppose the clone detection tool reports the pair of gapped clones, shown in the following figure, understanding the semantic difference between the clones contained in method readProxyDesc(boolean) and readNonProxyDesc(boolen) in class java.io.ObjectInputStream is interesting and helpful for maintenance of this pair of clones. Apart from the existing gap of several different statements, this pair of gapped clone has some inconvenient and minor changes in used constant variables and the accessed function call.
According to the above defined types of semantic differences in the last paragraph of Introduction part, our tool finally reports the differencing result of the above clone pair in various highlighting colors of comparison editor shown in the figure below: differential properties (in red background) at line 1, additional operation(s) (underscored with yellow line and also in italic) for the clone gap between line 6 and line 7, additional branch(es) (in orangish foreground) for the clone gap between line 6 and line 7, and unmatched operation(s) (in blue background) between line 6 and line 7 and at line 9. The reported semantic differences are rational, since at line 1 the two constant variables TC_PROXYCLASSDESC and TC_CLASSDESC refer to the different constant integer values. Furthermore, between line 6 and line 7 in readProxyDesc(boolean), there exists a clone gap, namely a for loop statement, which indicates an additional branch in the result. In contrast, there is an additional operation between line 6 and line 7 in readNonProxyDesc(boolean), which is due to the clone gap---- some additional assignments. Note that. our tool also reports a pair of unmatched operation: readUTF() and the constructor method of StreamCorruptedException(). At line 9, the two methods have similar but semantically different function calls---- namely initProxy() and initNonProxy(), which indicates an unmatched operation. Actually, we also defined and can highlight other types of semantic differences (not shown in this example) like: partially matched branch(es) (in green foreground), additional block(s) (underscored with dark blue line and also in italic) and unmatched block pair(s) (in purple background). More examples with other semantic types can be viewed in the downloads Examples.
The current release version has integrated 8 built-in queries to aid the refactoring works based on our tool. Among all the compared clone pairs/class, there queries are designed to retrieve the candidate clones that are related to a certain type of maintenance task.
a. Clones irrelevant to OO refactorings
The clone pair containing at least two additional blocks or at least one unmatched block pair is considered as the clone that is hard for the traditional OO refactorings.
b. Clones relevant to duplicate code
The clones will be considered as the candidates of duplicate code if they satisfy one of the following condition: 1). They have no semantic difference in terms of their PDGs. 2).the “part of” relations between two clones' PDGs often reveal code duplication. 3). the clones that have only differential properties are often duplicate code.
c. Clones due to inconsistent programming styles
The clones that contain partially matched branch(es) and/or additional branch(es) may imply the potential differnt programming styles: like the null check for some variables. These inconsistent programming styles sometimes even lead to the potential bugs.
d. Clones relevant to the generic types
The unmatched typecasting pair(s) (typecasting is one type of operation, including the PGD node types: NEW, INSTANCEOF, or CAST) indicates the potential program logics of creating, checking or converting differnt obeject/data types. And the correponding activities like creating, checking or converting differnt obeject/data types might be handled by generic types.
e. Clones relevant to parallel inheritance hierarchies
The unmatched method-invocation/field-access pair(s) (method-invocation and field-access are two types of operations, including the PGD node types: INVOKE/FGET/FPUT) are usually attibuted to the calling of differnt methods which do not share the same super class. This is a phenomenon brought by the exsiting parallel inheritance hierarchies. The Traits technique may mitigate the difficulty.
f. Clones relevant to Parameterized Unit Test (PUT) patterns: Seed values
These cloned test methods contain only differential-operand or differential-value properties and/or unmatched constants pair(s) (constant is one type of operation, including the PDG node type: CONSPARAMETER).
g. Clones relevant to PUT patterns: State machine
These cloned test methods contain repetitive additional operation(s) and unmatched method-invocation pair(s). These method are good candidates for the state machine pattern.
h. Clones relevant to PUT patterns: Assume and assert invariants
The clone pairs contain at least two method-invocation pair(s) of assertxxx() methods with matched expressions.
The user manual describes how to generate the answers for these queries, how to import the result in Excel for further observations. Actaully, soon we will implement the interface to let user customize the queries. The current version we only provide the filter function and these 8 built-in queries.
Note that:
The statistic data for JavaIO is based on the clones detected with minimum token length 30.
The statistic data for JDT_Test_Model is based on the clones detected with minimum token length 60.
CloneDifferentiator has been developed at the National University of Singapore. For more information and the technical support on CloneDifferentiator , please contact the developers: