

On a t-statistic scale, this occurs at about 2 log p factor of the best possible risk. The key line in the sand is at what can be thought of as the Bonferroni point: namely how significant the best spurious variable should be based on chance alone. This problem can be mitigated if the criterion for adding (or deleting) a variable is stiff enough. Extreme cases have been noted where models have achieved statistical significance working on random numbers. In other words, stepwise regression will often fit much better in sample than it does on new out-of-sample data. Hence it is prone to overfitting the data. One of the main issues with stepwise regression is that it searches a large space of possible models. The procedure terminates when the measure is (locally) maximized, or when the available improvement falls below some critical value. At each stage in the process, after a new variable is added, a test is made to check if some variables can be deleted without appreciably increasing the residual sum of squares (RSS). This is a variation on forward selection. The procedure is used primarily in regression analysis, though the basic approach is applicable in many forms of model selection. This is an automatic procedure for statistical model selection in cases where there is a large number of potential explanatory variables, and no underlying theory on which to base the model selection. Bidirectional elimination, a combination of the above, testing at each step for variables to be included or excluded.Ī widely used algorithm was first proposed by Efroymson (1960).Backward elimination, which involves starting with all candidate variables, testing the deletion of each variable using a chosen model fit criterion, deleting the variable (if any) whose loss gives the most statistically insignificant deterioration of the model fit, and repeating this process until no further variables can be deleted without a statistically significant loss of fit.Forward selection, which involves starting with no variables in the model, testing the addition of each variable using a chosen model fit criterion, adding the variable (if any) whose inclusion gives the most statistically significant improvement of the fit, and repeating this process until none improves the model to a statistically significant extent.

The main approaches for stepwise regression are:
