Centering To Correct For Multicollinearity Due To Polynomials, Interactions
I was going through my notes from when I was teaching econometrics five years ago (feels like yesterday) and I noticed I had an early draft of a book about Econometrics. I must have forgotten about it, which is not a surprise for somebody with ADHD. Speaking of, my answer may go into serious details. That is the hyper focus part that comes with having ADHD. My apologies in advance!
Not a super technical one. Obviously that didn't happen, and I'll leave that to more skillful econometricians... but I suppose I could put some stuff out that I was collecting. In this new version, I took a lot from Clyde Schechter's answer to a question on Statalist as well as some from Paul Allison and Arthur Goldberger's texts (also suggested on the same thread, which is shown at the bottom).
The goal is a simple remedy for the multicollinearity that often appears after adding interaction or polynomial terms (polynomials are self‑interactions). To motivate the fix, we’ll start from first principles.
What is multicollinearity?
One of the fundamental assumptions in simple linear regression is sample variation in X. This should be a pretty simple assumption for both mathematical and practical reasons.
Mathematically, in SLR, we would define our coefficient on X as: $$ \hat\beta_1 = \frac{\sum_{i=1}^n (X_i - \bar X)(Y_i - \bar Y)}{\sum_{i=1}^n (X_i - \bar X)^2}, $$ Since we can't divide by zero, we have to have variation in X. This is born out of a mathematical necessity. (But also, it wouldn't make any sense or be of any use if X never varied...why would we even be looking for the relationship in X on Y if X never changed?).
In MLR, we face a similar mathematical reasoning for an assumption: no perfect multicollinearity, meaning X must be full rank. In other words, our individual x columns can not be perfect linear combinations of the others.
This as our SLR equivalent of sample variation, as a mathematical reason. If we have a model:
$$Y = X\beta + u$$ where \(X\) is the \(n \times k\) regressor matrix (including a column of 1s for the intercept).
The OLS estimator is: $$ \hat\beta = (X'X)^{-1}X'Y $$
So, while in SLR, we need $\sum_{i=1}^n (X_i - \bar X)^2$ to not be zero, we need the determinant of $X'X$ to not be zero for MLR, because we cannot invert $X'X$ otherwise. When would it be zero? When $X$ is not full rank.
Note that this also continues our requirement to have variation in X. Say your model has:
- Intercept Column (intercept is always a vector): $$ \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}$$
-
Some X-variable with no variation: $$ \begin{bmatrix} 5 \\ 5 \\ 5 \end{bmatrix} $$
-
Matrix $X$ would therefore be: :
$$ X = \begin{bmatrix} 1 & 5 \\ 1 & 5 \\ 1 & 5 \end{bmatrix} $$
We can see that Column 2 = 5 × Column 1 and thus is a linear combination of column 1 Our determinant would be 0 and we could not invert.
The TLDR English translation: we need the values of x to not be perfect linear combinations of other variables.
Close but not perfect
What if we have variables that are highly correlated, but not perfect? What problems might we have?
When predictors overlap heavily, the model struggles to decide how to assign the shared variation in the outcome to each predictor. That shows up as:
-
Unstable coefficients (high variance / sensitivity).
Small changes to the data or the model (e.g., adding or removing a control) can cause large swings in the estimated coefficients, even flipping signs. This doesn’t mean the model is “wrong”... it means the unique part of each predictor is weakly identified. -
Large standard errors and wide confidence intervals.
The overlap among predictors inflates the variance of the coefficient estimates. You see larger SEs, wider CIs, and weaker t-stats, so genuinely important predictors can look “insignificant.”
I understand these two things to be related in the sense that our estimator remains unbiased in the presence of this near-extreme multicollinearity. In Clyde's comment on Statalist, he brings up the idea of a variable that has a large sampling variance of 16 (thus a standard deviation of 4) and a true population mean of of 2. Well...
$$ P(X \le 0) = \int_{-\infty}^{0} \frac{1}{4\sqrt{2\pi}} e^{-\frac{(x-2)^2}{32}} dx \approx 31\% $$
0 is only half a standard deviation away from the mean, 4/2-2=0, so despite knowing that we have a true mean of 2, having such a large standard deviation means we have a 31% chance of our coefficient being non-positive.
Since the issues with the coefficient is not bias, it is larger standard errors, we may be ok if we have small enough standard errors to reject, so no big deal. But what if we don't? Can we do ANYTHING?
Well, maybe. It depends on the source of the multicollinearity. There are two common ways it shows up:
-
Structural (model-induced) multicollinearity.
Created by the model itself when you add polynomial terms (e.g.,X^2) or interactions (e.g.,X*Z). These new variables are built from the originals, so they’re naturally highly correlated with them. -
Data (source) multicollinearity.
Predictors are highly correlated in the raw data because they measure the same concept (e.g., “years experience” and “age”) or are mechanically linked (e.g., Celsius and Fahrenheit with an intercept create perfect multicollinearity). In the perfect case, OLS can’t estimate the model; software drops a variable or fails.
For structural....
Centering vs. Standardizing
Centering
- Create a centered version of the base of the polynomials. Example: $X\_centered_i = X_i - \bar{X}$
- What it helps with: the multicollinearity caused by interactions/polynomials.
- Interpretation: coefficients stay in the original units. The intercept now means “the expected Y when predictors are at their means,” which is way more sensible.
- When to prefer: most of the time if you want plain-English coefficients (per unit).
- This also makes your intercept more interpretable: The value of y when all other variables are at their means.
Standardizing
- Do: $X\_std_i = \dfrac{(X_i - \bar{X})}{std(x)}$
- What it adds: puts predictors on a common scale so coefficients are “per 1 SD change.”
- Tradeoff: you lose the “per unit” interpretation, which can be less intuitive for audiences.
- When to prefer: comparing effect sizes across variables with different units/scales or when your model really benefits from everything being on the same scale.
Important limits
- Neither centering nor standardizing magically fixes all multicollinearity. If two raw predictors are almost duplicates, you still have a modeling problem. Centering helps specifically with the structure-induced collinearity from X*Z, X^2, etc.
What actually happens to your model
-
With uncentered
X,Z, andX*Z, you often see high p-values and measures that check for multicollinearity, like Variance Inflation Factor (VIF), are high. A rule of thumb for VIF is that any VIF over 10 is a problem, but this is not agreed upon. Some go all the way down to 2.5. You can interpret it as the factor the variance is inflated for each coefficient than what you would expect if there was no multicollinearity. Example, a VIF of 5 has 5x the variance it would without multicollinearity -
After centering:
- VIF should decrease.
- Collinear coefficients may change (those that are not collinear should not change)
- It’s common that a term that looked “not significant” becomes significant after centering because you reduced the noise (variance inflation), not because you cherry-picked a result.
This is simply cleaning up a numerical side effect of how interaction/polynomial terms are constructed.
For sources other than structural:
- Delete one or more variables from the model. If the correlation is really high, why include both? There may be a reason, but this needs to be thought through. And if you do keep them, which do you keep? Shouldn't make too much of a difference either way, but you should be able to defend your choice.
- Combine the collinear variables into an index.
- You could turn different variables that are similar into an index. The easy way to do this is to average them if they all have the same scale, systems of measurement, etc. Otherwise, standard normalize them first, then average.
- Use regularization
- Somebody can feel free to prove me wrong, but at current, I am not a fan of regularization with economics as I think theory should guide what you put in the model.
- Use Principal Components Analysis to replace collinear variables with orthogonal (uncorrelated) components.
Some examples:
Stata code:
*******************************************************
* Example: Centering, polynomials, and multicollinearity
* Dataset: Stata auto data
*******************************************************
sysuse auto, clear
*******************************************************
* 1. Polynomial regression WITHOUT centering
* Model: price = β0 + β1*weight + β2*weight^2 + u
* This creates structural multicollinearity:
* weight and weight^2 are highly correlated.
*******************************************************
reg price c.weight##c.weight
/*
. reg price c.weight##c.weight
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(2, 71) = 23.09
Model | 250285462 2 125142731 Prob > F = 0.0000
Residual | 384779934 71 5419435.69 R-squared = 0.3941
-------------+---------------------------------- Adj R-squared = 0.3770
Total | 635065396 73 8699525.97 Root MSE = 2328
-----------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------+----------------------------------------------------------------
weight | -7.273097 2.691747 -2.70 0.009 -12.64029 -1.905906
|
c.weight#c.weight | .0015142 .0004337 3.49 0.001 .0006494 .002379
|
_cons | 13418.8 3997.822 3.36 0.001 5447.372 21390.23
-----------------------------------------------------------------------------------
Note:
- Strong curvature (quadratic term significant).
- This specification tends to create high multicollinearity between weight and weight^2.
*/
*******************************************************
* 2. Check VIFs for the uncentered polynomial model
* We expect very large VIFs for weight and weight^2
* because of their near-linear dependence.
*******************************************************
estat vif
/*
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
weight | 58.95 0.016963
c.weight#|
c.weight | 58.95 0.016963
-------------+----------------------
Mean VIF | 58.95
Rule of thumb: VIF > 10 suggests problematic multicollinearity.
Here, VIF ≈ 59 — an example of structural multicollinearity from the polynomial.
*/
*******************************************************
* 3. Center weight around its mean
* Define c_weight = weight - mean(weight)
* This keeps the same information but shifts the zero point.
*******************************************************
quietly summarize weight
local average_weight = r(mean)
generate c_weight = weight - `average_weight'
*******************************************************
* 4. Polynomial regression WITH centered weight
* Model: price = β0 + β1*c_weight + β2*c_weight^2 + u
*
* Important facts:
* - The quadratic coefficient is IDENTICAL to the uncentered model.
* - The main effect coefficient changes.
* - The intercept changes a lot.
* - Fit (R², residuals, SSE, Root MSE) is unchanged.
*******************************************************
reg price c.c_weight##c.c_weight
/*
. reg price c.c_weight##c.c_weight
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(2, 71) = 23.09
Model | 250285462 2 125142731 Prob > F = 0.0000
Residual | 384779934 71 5419435.69 R-squared = 0.3941
-------------+---------------------------------- Adj R-squared = 0.3770
Total | 635065396 73 8699525.97 Root MSE = 2328
---------------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
c_weight | 1.870939 .3540693 5.28 0.000 1.164945 2.576933
|
c.c_weight#c.c_weight | .0015142 .0004337 3.49 0.001 .0006494 .002379
|
_cons | 5263.004 374.2033 14.06 0.000 4516.864 6009.144
---------------------------------------------------------------------------------------
Note:
- Quadratic term (.0015142) is exactly the same as in the uncentered model.
- Only the linear term and intercept change.
- R-squared and Root MSE match the uncentered specification.
*/
*******************************************************
* 5. Simple linear regression: price on weight
* Centering a single regressor does NOT change the slope,
* only the intercept.
*******************************************************
reg price weight
reg price c_weight
/*
(Outputs omitted to save space; key point:)
- The slope on weight and on c_weight is identical.
- Only the intercept differs when you center.
- This is a general property of SLR: centering does not affect the slope.
*/
*******************************************************
* 6. Add a random covariate u
* Show that centering weight does NOT affect:
* - its slope
* - the slope on u
* Only the intercept changes.
*******************************************************
generate u = runiform()
reg price weight u
/*
. reg price weight u
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(2, 71) = 15.52
Model | 193183874 2 96591937.2 Prob > F = 0.0000
Residual | 441881522 71 6223683.4 R-squared = 0.3042
-------------+---------------------------------- Adj R-squared = 0.2846
Total | 635065396 73 8699525.97 Root MSE = 2494.7
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | 2.040029 .3757081 5.43 0.000 1.290888 2.789169
u | 1292.857 1078.113 1.20 0.234 -856.839 3442.553
_cons | -687.4048 1301.212 -0.53 0.599 -3281.947 1907.137
------------------------------------------------------------------------------
*/
reg price c_weight u
/*
. reg price c_weight u
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(2, 71) = 15.52
Model | 193183874 2 96591937 Prob > F = 0.0000
Residual | 441881522 71 6223683.41 R-squared = 0.3042
-------------+---------------------------------- Adj R-squared = 0.2846
Total | 635065396 73 8699525.97 Root MSE = 2494.7
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
c_weight | 2.040029 .3757081 5.43 0.000 1.290888 2.789169
u | 1292.857 1078.113 1.20 0.234 -856.8391 3442.553
_cons | 5472.379 646.4874 8.46 0.000 4183.319 6761.438
------------------------------------------------------------------------------
Note:
- Slope on weight vs c_weight is the same (2.040...).
- Slope on u is unchanged.
- Intercept shifts, as expected when you shift the origin of weight.
*/
*******************************************************
* 7. Polynomial model with an additional covariate (length),
* BEFORE centering weight.
*******************************************************
reg price c.weight##c.weight length
/*
. reg price c.weight##c.weight length
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(3, 70) = 17.11
Model | 268671332 3 89557110.5 Prob > F = 0.0000
Residual | 366394065 70 5234200.92 R-squared = 0.4231
-------------+---------------------------------- Adj R-squared = 0.3983
Total | 635065396 73 8699525.97 Root MSE = 2287.8
-----------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------+----------------------------------------------------------------
weight | -4.175976 3.119071 -1.34 0.185 -10.39677 2.044815
|
c.weight#c.weight | .0013255 .000438 3.03 0.003 .000452 .0021991
|
length | -71.44619 38.12082 -1.87 0.065 -147.4758 4.58338
_cons | 19326.45 5037.056 3.84 0.000 9280.36 29372.54
-----------------------------------------------------------------------------------
*/
*******************************************************
* 8. Polynomial model with length, AFTER centering weight.
* Important pattern:
* - c_weight (main effect) changes.
* - c_weight#c_weight (quadratic) is unchanged.
* - length coefficient is unchanged.
* - intercept changes.
* - overall fit (R², SSE) is identical.
*******************************************************
reg price c.c_weight##c.c_weight length
/*
. reg price c.c_weight##c.c_weight length
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(3, 70) = 17.11
Model | 268671332 3 89557110.6 Prob > F = 0.0000
Residual | 366394064 70 5234200.92 R-squared = 0.4231
-------------+---------------------------------- Adj R-squared = 0.3983
Total | 635065396 73 8699525.97 Root MSE = 2287.8
---------------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
c_weight | 3.828901 1.101116 3.48 0.001 1.632794 6.025008
|
c.c_weight#c.c_weight | .0013255 .000438 3.03 0.003 .000452 .0021991
|
length | -71.44619 38.12082 -1.87 0.065 -147.4758 4.583379
_cons | 18802.46 7233.465 2.60 0.011 4375.771 33229.15
---------------------------------------------------------------------------------------
Summary of this comparison:
- Quadratic term is identical before/after centering.
- length coefficient is identical before/after centering.
- Only the main effect of weight (now c_weight) and the intercept move.
- R-squared and model fit are unchanged; centering changes interpretation,
not the underlying fit.
*/
My sources
- Statalist: Multicollinearity and coefficient bias (thread)
- Newsom (PSU): Centering predictors handout (PDF)
- rforhr.com: Centering vs. standardizing overview
- Paul D. Allison, Multiple Regression: A Primer (book)
- YouTube: Video explanation related to centering/multicollinearity