11 Oct 2005
CS 5244 - Computational Document Analysis
28
Design-based methods
¡Idea: capture syntactic and semantic flow rather than token identity (for source code)
¡
¡Replace variable names with IDs correlated with symbol table and data type
¡Decompose each p into regions of
lsequential statements
lconditionals
llooping blocks – recurse on these
¡Calculate similarity from root node downwards
¡
¡