Overfiting And Underfitting In Machine Learning Data Science

Overfitting

Overfitting occurs when a machine learning model tries to fit the training data too closely. It learns even the smallest details, including noise or irrelevant patterns in the training data. As a result, the overfitted model performs very well on the training data but poorly on new or real test data because it cannot generalize.

1. Example of Overfitting

Imagine you are a teacher preparing a student for an exam. You focus so much on last year’s exam questions that the student memorizes them word for word. Now, if new questions appear in the actual exam, the student struggles to answer them correctly because they were only trained on the specific old questions.

This is what overfitting is in machine learning. The model gets too focused on the details and noise of the training data, making it unable to generalize well to new, unseen data.

---

Underfitting

Underfitting happens when the model fails to learn enough from the training data. This means it cannot fully understand the patterns in the training data, leading to poor performance on both the training and new test data. Underfitting typically occurs when the model is too simple or lacks sufficient features.

2. Example of Underfitting

Now, consider another student you’re preparing, but this time, you only teach them basic concepts and give them very few practice questions. As a result, the student is not fully prepared to answer the questions in the exam, especially the more complex ones, because they didn't learn enough.

Thus, to avoid overfitting and underfitting, it is essential to train the model with the right amount of data and balance its complexity so that it performs well on both the training data and new unseen data.

Search This Blog

Learn Data Science Easy Way A To Z

Overfiting And Underfitting In Machine Learning Data Science

Comments

Post a Comment

Popular posts from this blog

Feature Engineering in Machine Learning: A Beginner's Guide Missing value imputation, handling categorical data ,outlier detection and feature scaling

Handling Missing Numerical Data with Simple Imputer

Feature Construction and Feature Splitting in Machine Learning data science