How to Predict which Statistical Learning Method is Best ?

Kashika Yadav
3 min readSep 19, 2021

The answer is simply by using each method one by one and predict the model accuracy. The one with highest model accuracy is the best method.

While there are so many statistical tools from Lasso(very less flexible method )to linear regression to more flexible methods like GAMs (Generalized additive methods) and fully non linear methods like bagging ,boosting and support vector machines . The problem lies with which statistical method to use while computing correct predictions. Sometimes the goal is not just to find correct predictions rather to find correct inferences.

As flexibility increases, accuracy of predictions increases while inferences decreases.

What is the difference between predictions and inferences?

1. Well, predictions means f(xo) i.e. you’re concerned with only the output, computed function will give when you put in the input in the function.

2. Inferences means you want to know about the relationships between the predictions and responses.

3.For instance if we compute the sales of a grocery ,based on raw material expenses, worker expenses etc, sales means predictions and the following are inferences :

  • Which expense leads the most sales?
  • If we increase the worker expense, will it boost the sale or not?

Back to the problem again, which statistical method is the best for a model. The answer is not just any one method since every data set is different, different statistical methods will give different results. Our work is to compute which one is best.

Now, how to predict the accuracy of model and find which one is the best statistical method for a given problem statement?

The answer is by Mean Squared Error (MSE) which is the average of squared difference of predicted response and true response.

Ave((y-f(xo))^2). Now , MSE on test set is better accuracy metrics than training set but since most of the times we do-not have true responses of test data , we use MSE on train data. Keep in mind, our main goal is to have less MSE on test data.

With increase in flexibility, the MSE decrease for training set but for test data set it decreases and then increases.

Look at the graph above, the grey line is for train data and red line is for test data. MSE is continually decreasing for train data but for test data it decreases then increases. Orange dot is for linear regression model while blue and green dot is for more flexible models like smoothing spline fits.

We clearly see the blue point is the best for us since it has low MSE for both test and train data.

The bias- variance trade-off

E(yo-f(xo))^2=

var(f(xo))+ bias(f(xo))^2+var(E)

While we calculate MSE it has terms of variance and bias . For low MSE we need low bias and low variance .Variance increases with increased flexibility while bias decreases with increased flexibility. This is called Bias Variance TradeOff.

The above graphs are for the three previous models linear regression, smoothing spline fits 1 and 2. Red line is for MSE , blue for bias , orange for variance, dashed line for var(E) .

In all the cases bias decreases while variance increases with increasing flexibility. But the rate differs in all the 3. In the left most graph , while bias is decreasing,variance is increasing continuously . In the center, there’s small decrease in bias while variance increases since true f is close to linear. In the right most, the true f is non liner hence there’s dramatic decline in bias and little increase in variance.

Hence in conclusion, to predict best model apply these three:

  • Ask yourself if you just need accuracy in predictions or accuracy in inferences and then choose the model.
  • Choose the model which has low MSE , low Loss function or Error on both train and test data.
  • Choose model which has low bias , low variance and low MSE with increasing flexibility .

--

--

Kashika Yadav

Philosophy-Mathematics-Neuroscience -Artificial Intelligence. I share what I learn.