Regression metrics in machine learning

 Regression metrics help us evaluate the performance of regression models in machine learning. For beginners, understanding these parameters is important for model selection and optimization. In this article, we will focus on the important regression metrics: MAE, MSE, RMSE, R² score, and adjusted R² score.

  Each section is written in list format for better clarity and understanding.

  1. Mean Absolute Error (MAE)

  MAE calculates the average of absolute differences between predicted and actual values.

  formula:

  


  Important points:

  1. Easy to understand: MAE is easy to understand and calculate.

  2. Same unit as the target variable: The errors are in the same unit as the target variable.

  3. Not sensitive to outliers: Large errors do not affect MAE as much as they do MSE.

  Use cases:

  When you need a simple and descriptive metric for error measurement.

  Python code:

  import mean_absolute_error from sklearn.metrics

  # Actual and projected values

  y_true = [50, 60, 70, 80, 90]

  y_pred = [48, 62, 69, 78, 91]

  # Calculate the MAE

  mae = mean_absolute_error (y_true, y_pred)

  print("Mean Absolute Error (MAE):", mae)

  2. Mean Squared Error (MSE)

  MSE calculates the average of the squared differences between predicted and actual values.

  formula:

  


  Important points:

  1. Punishes big mistakes: Square mistakes increase their impact.

  2. Optimization in general: widely used for model training.

  3. Units are squared: Errors are in squared units of the target variable, which can be difficult to interpret.

  Use cases:

  Useful when you want to punish big mistakes.

  Python code:

  import mean_squared_error from sklearn.metrics

  # Calculate the MSE

  mse = mean_squared_error(y_true, y_pred)

  print("Mean Squared Error (MSE): "mse)

  3. Root Mean Squared Error (RMSE)

  Description:

  RMSE is the square root of MSE and provides a more descriptive error metric.

Important points 

  1. Same unit target variable: Easier to interpret than MSE.

  2. Sensitive to outliers: Like MSE, RMSE penalizes large errors.

  Use cases:

  When you need an interpretable error measure that considers large deviations.

  Python code:

  import np as numpy

  # Calculate the RMSE

  rmse = np.sqrt(mse)

  print("Root Mean Squared Error (RMSE):", rmse)

  4. R-squared (R²) score

  R² measures how much variance in the target variable is explained by the model.

  formula:

  


  Important points:

  1. Range: R² ranges from 0 to 1, with 1 being a perfect fit.

  2. Negative values: A negative R² indicates the model is worse at predicting the mean.

  3. Explains variance: Higher values mean the model explains more variance.

  Use cases:

  Estimate the overall goodness of fit of the regression model.

  Python code:

  import r2_score from sklearn.metrics;

  # Calculate the R² score

  r2 = r2_score(y_true, y_pred)

  print("R-Squared (R²) score:", r2);

  5. Adjusted R-Square

  Description:

  Adjusted R² Adjusts the R² value by the number of predictors in the model.

  formula:

  : number of observations

  : number of predictors

  Important points:

  1. Better for multiple predictors: Penalizes models with irrelevant features.

  2. Can decrease: Unlike R², adjusted R² can decrease when adding unrelated predictors.

  Use cases:

  Comparing models with different statistics.

  Python code:

  # function to calculate the adjusted R²

  def adjusted_r2(r2, n, p):

        Returns 1 - ((1 - r2) * (n - 1) / (n - p - 1))

  # Example calculations

  n = lane(y_true)

  p = 1 # Number of predictors

  adj_r2 = adjusted_r2 (r2, n, p)

  print("adjusted r-squared:", adj_r2);

  Comparison of metrics

  result

  Understanding these regression metrics helps build, evaluate, and compare models effectively. Each metric serves a specific purpose:

  1. Use MAE for simple and robust error measurement.

  2. Opt for MSE or RMSE when it is important to penalize large errors.

  3. Evaluate the performance of the model

  e using R².

  4. Prefer adjusted R² for models with multiple characteristicjs.

  These metrics are fundamental to any data scientist or machine learning engineer aiming to build accurate and reliable regression models.

Comments

Popular posts from this blog

Feature Engineering in Machine Learning: A Beginner's Guide Missing value imputation, handling categorical data ,outlier detection and feature scaling

Handling Missing Numerical Data with Simple Imputer

Feature Construction and Feature Splitting in Machine Learning data science