Regression metrics in machine learning
Regression metrics help us evaluate the performance of regression models in machine learning. For beginners, understanding these parameters is important for model selection and optimization. In this article, we will focus on the important regression metrics: MAE, MSE, RMSE, R² score, and adjusted R² score.
Each section is written in list format for better clarity and understanding.
1. Mean Absolute Error (MAE)
MAE calculates the average of absolute differences between predicted and actual values.
formula:
Important points:
1. Easy to understand: MAE is easy to understand and calculate.
2. Same unit as the target variable: The errors are in the same unit as the target variable.
3. Not sensitive to outliers: Large errors do not affect MAE as much as they do MSE.
Use cases:
When you need a simple and descriptive metric for error measurement.
Python code:
import mean_absolute_error from sklearn.metrics
# Actual and projected values
y_true = [50, 60, 70, 80, 90]
y_pred = [48, 62, 69, 78, 91]
# Calculate the MAE
mae = mean_absolute_error (y_true, y_pred)
print("Mean Absolute Error (MAE):", mae)
2. Mean Squared Error (MSE)
MSE calculates the average of the squared differences between predicted and actual values.
formula:
Important points:
1. Punishes big mistakes: Square mistakes increase their impact.
2. Optimization in general: widely used for model training.
3. Units are squared: Errors are in squared units of the target variable, which can be difficult to interpret.
Use cases:
Useful when you want to punish big mistakes.
Python code:
import mean_squared_error from sklearn.metrics
# Calculate the MSE
mse = mean_squared_error(y_true, y_pred)
print("Mean Squared Error (MSE): "mse)
3. Root Mean Squared Error (RMSE)
Description:
RMSE is the square root of MSE and provides a more descriptive error metric.
Important points
1. Same unit target variable: Easier to interpret than MSE.
2. Sensitive to outliers: Like MSE, RMSE penalizes large errors.
Use cases:
When you need an interpretable error measure that considers large deviations.
Python code:
import np as numpy
# Calculate the RMSE
rmse = np.sqrt(mse)
print("Root Mean Squared Error (RMSE):", rmse)
4. R-squared (R²) score
R² measures how much variance in the target variable is explained by the model.
formula:
Important points:
1. Range: R² ranges from 0 to 1, with 1 being a perfect fit.
2. Negative values: A negative R² indicates the model is worse at predicting the mean.
3. Explains variance: Higher values mean the model explains more variance.
Use cases:
Estimate the overall goodness of fit of the regression model.
Python code:
import r2_score from sklearn.metrics;
# Calculate the R² score
r2 = r2_score(y_true, y_pred)
print("R-Squared (R²) score:", r2);
5. Adjusted R-Square
Description:
Adjusted R² Adjusts the R² value by the number of predictors in the model.
formula:
: number of observations
: number of predictors
Important points:
1. Better for multiple predictors: Penalizes models with irrelevant features.
2. Can decrease: Unlike R², adjusted R² can decrease when adding unrelated predictors.
Use cases:
Comparing models with different statistics.
Python code:
# function to calculate the adjusted R²
def adjusted_r2(r2, n, p):
Returns 1 - ((1 - r2) * (n - 1) / (n - p - 1))
# Example calculations
n = lane(y_true)
p = 1 # Number of predictors
adj_r2 = adjusted_r2 (r2, n, p)
print("adjusted r-squared:", adj_r2);
Comparison of metrics
result
Understanding these regression metrics helps build, evaluate, and compare models effectively. Each metric serves a specific purpose:
1. Use MAE for simple and robust error measurement.
2. Opt for MSE or RMSE when it is important to penalize large errors.
3. Evaluate the performance of the model
e using R².
4. Prefer adjusted R² for models with multiple characteristicjs.
These metrics are fundamental to any data scientist or machine learning engineer aiming to build accurate and reliable regression models.
Comments
Post a Comment
"What’s your favorite part of this post? Let us know in the comments!"