Supervised Categorization of Javascript using Program Analysis Features

Wei Lu and Min-Yen Kan

AIRS 2005 (Jeju Island, Korea)

18/22

Evaluation on All Components

• Metric and context-sensitive features alone create poor classifiers

• When coupled with lexical features can reduce errors by over 50%

• A careful study of language feature helps categorization

• Syntactic features are noisy

• Ensemble method: Many weak classifiers à Good classifier

•


	Compared to the baseline, using lexical analysis alone, we can reduce 17% of the errors. This indicates that a careful study of language features helps categorization.
	As we have found earlier, using metrics or context-sensitive features alone does not give us a good performance, but when they are coupled with lexical features the overall performance can be increased. Using the final feature set we can perform the categorization task with an accuracy 93.95%, resulting in a 52% error reduction.
	Our experiment also showed that syntax features are noisy when coupled with other features, we therefore excluded this feature set in subsequent evaluations.
	As we can see, we have many weak classifiers, which can not perform well alone, but when they are combined together, a good classifier is built. This is considered ensemble method.