Memo:Heuristic approach to solving collinearity


The following paragraph is from Applied Predictive modeling (Book by Kjell Johnson and Max Kuhn) :

A less theoretical, more heuristic approach to dealing with this issue is
to remove the minimum number of predictors to ensure that all pairwise
correlations are below a certain threshold. While this method only identify
collinearities in two dimensions, it can have a significantly positive effect on
the performance of some models.
The algorithm is as follows:
1. Calculate the correlation matrix of the predictors.
2. Determine the two predictors associated with the largest absolute pairwise
correlation (call them predictors A and B).
3. Determine the average correlation between A and the other variables.
Do the same for predictor B.
4. If A has a larger average correlation, remove it; otherwise, remove predic-
tor B.
5. Repeat Steps 2–4 until no absolute correlations are above the threshold.

The idea is to first remove the predictors that have the most correlated rela-

Due to a large amount of interest in collinearity problems, I will update this with more topics on L1, L2 regularization, drop-out and many more methods to on how to solve this collinearity problem.