Bias Variance Tradeoff

After applying a model to a problem, a natural thought is to wonder if our models are any good. One way to look at this is to check how it would do in many different situations, aka expected generalization error. We want our model to accurately predict outcomes for previously unseen data, not just the data that we’ve already received. We will first focus on the bias-variance tradeoff.

When I think of someone who is biased, I picture someone who doesn’t care to learn details and just generalizes without a second thought. When viewing a high biased algorithm, we look at one that doesn’t focus on the tiny details and just wants to get a general idea. Since we are not learning much from the data, we are underfitting the data because we do not use enough information. Another way of saying this is that it provides too simple a solution for a problem. In the example below we use a straight line when the data follows a curve.

On the other end of the spectrum, we observe high variance. When I first learned about variance in 8th grade, it was always in respect to standard deviation and that the square root of a variance is standard deviation. Also, make sure standard deviation is not too large. In terms of models we also want it to stay as small as possible. Following the example of the high bias person, a high variance person would be someone who focuses too much on insignificant details and don’t see the bigger picture. In the example above, the high variance model does not have a good way of generalizing for future unseen data.

Ideally, we want a model that has low bias and low variance. In the chart with the targets, we observe high variance as having a wide range and high bias as being inaccurate. Having a high bias, low variance model leads to a small cluster, but does not hit the target. A low bias, high variance model is centered around the target, but the expected values are spread out.