# Bias Vs Variance

August 10, 2020 2023-05-21 5:31## Bias Vs Variance

# Bias Vs Variance

**Bias and Variance **is a concept deep-rooted in Statistics and essential for data scientists. A

significant reason to understand these terms is that the right balance of variables is essential to

constructing machine-learning algorithms that create accurate results. Each algorithm develops

to some degree with Bias and Variance.

We define bias as the difference between the average prediction from our model and the actual

value that we are trying to predict. A model with high bias pays very little attention to the training

data and oversimplifies the model. It always leads to a high error on training and test data.

Variance is the variability of model prediction for a given data point or a value that tells us the

spread of our data. A model with high Variance pays a lot of attention to training data and does

not generalize on the data which it hasn’t seen before. As a result, such models perform well on

training data but have high test data error rates.

When a model doesn’t capture the underlying pattern of data, it is said to be underfitting in

supervised learning. These models usually have high bias and low Variance. We may encounter

it when there is not enough data to build an accurate model or if we try to build a linear model

based on nonlinear data. Also, this kind of model is straightforward to capture the complex

patterns in data like Linear and logistic regression.

In supervised learning, overfitting happens when our model captures the noise and the

underlying data pattern. It happens when we train our model a lot over noisy datasets. These

models have low bias and high Variance. These models are very complex, like Decision trees

which are prone to overfitting.**High Variance****Symptoms** **are as follows**

- Training error is much lower than test error
- Training error is lower than Ïµ
- Test error is above Ïµ

Remedies: - Add more training data
- Reduce model complexity — complex models are prone to high Variance
- Bagging (will be covered later in the course)
**High Bias**

Unlike the first regime, the second regime indicates high bias: the used model is not robust

enough to produce an accurate prediction.**Symptoms are as follows** - Training error is higher than Ïµ

Remedies: - Use more complex model (e.g. kernelize, use nonlinear models)
- Add features
- Boosting