Design-based methods
¡ Idea: capture syntactic and semantic flow rather
than token identity (for source code)
¡ Replace variable names with IDs correlated with
symbol table and data type
¡ Decompose each p into regions of
l sequential statements
l conditionals
l looping blocks – recurse on these
¡ Calculate similarity from root node downwards