17 Mar 2004
CS 3243 - Learning
41
Overfitting
n
Better training
performance = test
performance?
n
Nope.
Why?
1.
Hypothesis too specific
2.
Models noise
n
Pruning
¡
Keep complexity of
hypothesis low
¡
Stop splitting when:
1.
IC below a threshold
2.
Too few data points in
node
100%
80%
60%
40%
20%
0%
Precision
DT Size
Test
performance
Train
performance