Bias Vs Variance

data science terminologies data visualization machine learning pandas python Uncategorized

Bias Vs Variance

Bias and Variance is a concept deep-rooted in Statistics and essential for data scientists. A
significant reason to understand these terms is that the right balance of variables is essential to
constructing machine-learning algorithms that create accurate results. Each algorithm develops
to some degree with Bias and Variance.

We define bias as the difference between the average prediction from our model and the actual
value that we are trying to predict. A model with high bias pays very little attention to the training
data and oversimplifies the model. It always leads to a high error on training and test data.
Variance is the variability of model prediction for a given data point or a value that tells us the
spread of our data. A model with high Variance pays a lot of attention to training data and does
not generalize on the data which it hasn’t seen before. As a result, such models perform well on
training data but have high test data error rates.

When a model doesn’t capture the underlying pattern of data, it is said to be underfitting in
supervised learning. These models usually have high bias and low Variance. We may encounter
it when there is not enough data to build an accurate model or if we try to build a linear model
based on nonlinear data. Also, this kind of model is straightforward to capture the complex
patterns in data like Linear and logistic regression.

In supervised learning, overfitting happens when our model captures the noise and the
underlying data pattern. It happens when we train our model a lot over noisy datasets. These
models have low bias and high Variance. These models are very complex, like Decision trees
which are prone to overfitting.

High Variance
Symptoms are as follows

  1. Training error is much lower than test error
  2. Training error is lower than ϵ
  3. Test error is above ϵ
  4. Add more training data
  5. Reduce model complexity — complex models are prone to high Variance
  6. Bagging (will be covered later in the course)

    High Bias
    Unlike the first regime, the second regime indicates high bias: the used model is not robust
    enough to produce an accurate prediction.
    Symptoms are as follows
  7. Training error is higher than ϵ
  8. Use more complex model (e.g. kernelize, use nonlinear models)
  9. Add features
  10. Boosting

Leave your thought here

Your email address will not be published. Required fields are marked *

Select the fields to be shown. Others will be hidden. Drag and drop to rearrange the order.
  • Image
  • SKU
  • Rating
  • Price
  • Stock
  • Availability
  • Add to cart
  • Description
  • Content
  • Weight
  • Dimensions
  • Additional information
Click outside to hide the comparison bar