Overfitting
n
Better training
performance = test
performance?
n
Nope.
Why?
1.
Hypothesis too specific
2.
Models noise
n
Pruning
¡
Keep complexity of
hypothesis low
¡
Stop splitting when:
1.
IC below a threshold
2.
Too few data points in
node
Test
performance
Train
performance