Re: interaction term in linear regression with a dummy coded predictor
- From: Old Mac User <chendrixstats@xxxxxxxxx>
- Date: Mon, 16 Mar 2009 07:11:10 -0700 (PDT)
On Mar 15, 11:25 pm, Ray Koopman <koop...@xxxxxx> wrote:
I agree that there are people who would misinterpret the uncentered
results in your example as implying that they should drop Pressure,
but they would be ignoring one of the rules of thumb of regression-
model building: "Any model that contains a product term must also
contain all the factors of that product."
Both you and Ken have made a good point... a point I intended to
mention this morning. Will do that in a moment.
In the meantime I fully agree with both of you that when an
interaction is honestly need and is in a model, discussing the slope
of the response vs. X1 or response vs. X2 is not a reasonable thing to
do. Obviously, in order to discuss "the effect of X1 upon the
response" we must specify the value of X2 etc.
Here's where I'm coming from. There are now thousands upon thousands
of people who are "using regression" in diverse ways. Their knowledge
is meager. Few have been burned enough times to make them cautious.
With or without formal university-level training in the fine points of
multiple regression, there is an awful tendency to run numbers through
software and report the results. Even then, those who read the
reports are subject to the the same human failings... they don't ask
enough questions.
So one of the "protective rules" says if there's an interaction in the
model then include both of the corresponding main effects. This
implies "regardless of their significance". Surely if the t-ratio for
the interaction is (say) double-digits and the t-ratios for the main
effects are near 0.5 or so, we have room for some discussion about
this. First, a very low t-ratio implies the standard error of that
model coefficient is very large relative to the value of the
coefficient. Expand this out as the variance of a predicted value and
we'll see that the variance of predictions is "large", and inflated by
the uncertainty in the coefficients for those two main effects. So
when I come to a case of a "very significant" interaction (I'm talking
a 2-factor interaction here) and trivial main effects I teach... "It's
then time to turn off the computer... take a long walk... kick a tree
or two... and grieve. Grieve because you should be using a different
model and you probably don't have the proper data for moving on to
that model. Hence you need more data and you need to think about what
kind of data you need."
To see this, let's consider a small example.
X1 X2 Y
1000 40 20
2000 40 50
1000 80 50
2000 80 20
Consider the values of the response Y to be averages of
appropriate amounts of replicated data.
What's the "effect" of X1? The average of all the data when
X1 is "high" (X1 = 2000) is (50 + 20)/2 = 35. The average
of all the data when X1 is "low" (X1 = 1000) is also (50 + 20)/2
or 35. So the average effect of X1 is the difference of
averages or 35 - 35 = 0
The same is true for the effect of X2. It's average effect is 0.
But notice that increasing X1 at the low level of X2 increases
the response from 20 to 50... and increasing X1 at the high level
of X2 decreases the response from 50 to 20. There's is a
red-hot interaction.
If we run these data through a regression analysis we'll learn
that the effects of X1 and X2 are both "zero" and that there's
a "large" interaction effect. (Of course we'd do this with the
replicated data, not with the averaged values shown in this small
table).
Suggestion: Draw some axes. Assign X1 to the horizontal and
X2 to the vertical. Add some tick marks and write 1000, 2000,
40, and 80 at the appropriate places along the axes. Now write
the values of the responses in their appropriate positions
at the "corners" of this experimental space.
This is an extreme case of a "standalone interaction". Main effects
are "zero" with a large interaction. The story I'm about to tell will
apply just as well to weaker versions of a standalone interaction (non-
significant main effects and a very important interaction) but it's
easier to tell with this extreme example.
Returning to the graphic, let's draw some lines of constant
response... a contour map. That's easy in this case. Draw a line
from the "20" in the lower-left corner to the "20" in the upper right
corner. The implication is that a model built from these data
(It will be of the form Y = bo + b12*X1*X2, hopefully with X1 & X2
properly centered) will predict "20" at all points on that line.
Now draw a line parallel to that line, passing it through the "50" in
the upper left corner... and another parallel line through the "50" in
the lower right corner. Add more parallel lines between these if you
wish. At this point you have a contour map and it suggests a classic
"valley". Mountains with a valley between them running at an angle to
the axes to use the familiar "elevation contours" analogy. So far so
good.
Except for this. There's an alternative interpretation of these
data!! Start all over again but this time draw a line through the
"50" in the upper left corner and the "50" in the lower right corner.
Then two more parallel lines through the "20" in the lower
left corner and the upper right corner. Now we have a classic
"ridge"... an elongated "mountain" sloping down to valleys.
The net from this is... anytime we have a standalone interaction
(interaction with trivial "main effects" we are dealing with ambiguous
data. There's are two distinctively different interpretations of these
data. Let's not get lost in the issues about including "weak" main
effects in the model (when the main effects are trivial but non-
zero). Either way, we have ambiguous data. So what should we do?
Turn off the computer... talk a walk... kick a couple of trees and
grieve. Do not continue trying to "analyze" or "interpret" the data
and for Pete's sake don't start writing a report!! Grieve, and
consider what to do next.
If you are with me this far you probably realize that the way to
discriminate between those two rival models is to get some data at the
centroid. Suppose we do that and the average at the centroid is near
"20". Ah... that confirms the valley. Or if the averages of
appropriate data at the centroid is near "50" that confirms the
ridge.
It might be smart to get data at the centroid in the first place...
before getting into this stage of grief... "just in case".
Of course that depends on the situation... the cost of getting data
and the logistics of getting data. But imagine having to go back to
"the boss" saying "we need just a little more data". Try explaining
that to one "not skilled in the art".
Conclusion: Anytime we have a standalone interaction we have ambiguous
data. That's not good.
Consider what happens with a lot of commercial software that's created
with a lot of whistles and bells to make life "easy" for anyone and
everyone. Run these same data through certain brands of software and
it will (1) build the model (remember, it will have just the
interaction term or in the case of weak non-zero main effects they
will likely be in the model) and (2) it immediately creates a contour
map. A "real" contour map. What will it draw?
The valley or the ridge. Neither!! It will draw and present to the
"analyst" a classic "saddle point". If you've not seen one of these,
they are a sight to behold. Most likely with a lot of colored bands
and really quite beautiful. At which point my "fraud detecting radar
system sounds off". So the innocent analyst writes this up in a
report and makes a presentation with beautiful overheads that have the
colorful saddle point... and his story will be "this thing is really
complicated... isn't it a good thing you hired me to explain it."
Do you think I'm kidding? Not at all. I've seen this movie many
times. Most of the time the presenter is looking at me for
confirmation (after he, he wants praise and confirmation from an
expert) while I'm bending down to pick up imaginary papers off the
floor. Or, ready to leave and go to the nearest vomitory. What will
happen to this guy if/when I try to explain to him (or her) that the
data is ambiguous? Will they break my sword, paint a yellow stripe
down my back, and push me out the gate never to return... for having
taught them about designed experiments?
Folks, this is a problem. Anytime I see a "standalone" interaction
(and that includes an interaction with weak main effects on board) I
cringe. Someone out there may be using the published model. It's
neither beast now fowl. It will indeed predict high values at the
appropriate corners and low values at the other corners. But it is
devoid of any meaning whatsoever other than "it fits the data".
(I gag when I hear "it fits the data", too.)
Going back for a moment, if we resolve whether it's a valley or a
ridge our work may not yet be done. A glance at either of these
two outcomes indicates that there is curvature due to either X1 or X2
or to both of them. That, too, may need resolution with still more
experiments. Or... in a worst case... the average at the centroid
turns out to be near "35" then we have another burden to bear. But at
least we won't end up saying silly things in front of the boss and his
boss as well.
I have seen silly presentations and reports of the kind mention here
again-and-again. My mantra is "If there's a standalone interaction
then turn off the computer... etc.) Some people have come back to me
to say "thanks".
So I'm saying I don't think that stewing over including or not
including the "weak" (near-zero) main effects is worth the effort. In
situations of this sort the data is ambiguous. We do indeed need a
different model, and sometimes "which model" is elusive. In some
instances I've managed to "interpret" situations of this sort in my
fields of chemistry and engineering... even physics and electronics...
and thereby resolve it without getting more data other than to confirm
my homemade model.
Now for a comment on "reversed signs", The first time I saw a living
example of that was when visiting a "statistics group" at a certain
company... a long time ago in a faraway place. They proudly showed me
their crowning achievement... an expensive set of data acquired from a
designed experiment in a large production facility. I looked at
diagrams of the plant hardware and what it was doing and started
inquiring about "effects" in the model. The model coefficient for one
of those effects was simply "backward" from what any engineer would
expect (non of the participants were engineers or chemists or
physicists, etc.) I questioned it... surely it's a typo. "No", they
said, and showed me listings from a computer. Then came the bad news.
Their analysis had been presented to the plant manager and his direct
reports (the plant was 800 miles from their home base) and the plant
manager questioned that model coefficient and announced to the world
"the sign on it is backward". The presented defended it and defended
it and waved computer listings and pushed them over to the manager.
The irritated manager said "impossible" and got up and left the room.
That one model coefficient was a measure of the lifeblood of the
operation. The magnitude of it largely determined their energy costs
(and this was 'way back in 1964) and everybody in the room knew which
sign that coefficient should... must... have. That was the end of the
presentation. And this was their
"crowning achievement"? And some were prepared to go to Brazil to
repeat similar experiments in another plant. And I was there
interviewing for a job!!
This was the first of many. You commented on the value of a house
example. The first defense, I believe, is to center the variables so
that the average effects of the primary variables (slopes of the
lines... model coefficients) should at least have the proper signs.
Fail to center and there can be a reversal of signs. The correlation
matrix along will speak the truth to this degree... the correlation
coefficients will have the proper signs in that rightmost column. The
second defense (and a very important one) is to compare what the model
is teaching vs. what we know about out base technology whether it be
engineering, electronics, brain science, or geology. If we are in
unfamiliar territory then by centering variables we will almost surely
be headed in the right direction. But I like to go to a subject-matter
expert and show them what I have found before going too much further.
"You'll never know how deep the puddle is until you step in it."
Or, as Warren Buffett recently said, you can't tell who's swimming
naked until the tide goes out". (Which it did last fall, in case you
didn't notice".
Hey, folks. I'm not a genius. Most of what I know I learned the hard
way. Some from bad experience. Some from some very smart and sometimes
difficult managers. A lot from teaching and being confronted with
questions I'd never thought about.
One last thought on this. It's easy to see what's happening when there
are just two variables X1 and X2 and a simple graphic. The problems
multiply when there are several variables and we can't "see" what we
are doing. That's one reason I want the correlation matrix and will
not proceed without it.
One more thought (I'm sounding like Columbo now) I've learned that
asking "experts" to tell me "will there be interactions in these
data? And if so, where? is worse than a waste of time. Even after
taking multiple courses in designed experiments, analysis of data,
etc. many still cannot give me a simple statement of what an
interaction is. I've even interviewed PhD statisticians for a
position in my group... asked them to tell me in simple English
"what's an interaction"... and they went off into a swamps in a sea of
math never to return. They would never make it in an organization
where they had to explain to clients the meaning of their data. So
what answers do I get if I ask "will there be any interactions in
these data"? More or less... "of course... this thing is so complex
that I'm one of the few who can understand it... it will be loaded
with interactions." Their basic understanding of interactions is
"complexity". So I caution students... be very careful with that word
"interactions". Say it again and again and managers think "this damn
thing is complex and we don't want to deal with complexity." That can
be a career-withering event. OMU
That's just the rule of
thumb; the actual rule is more complicated. It distinguishes between
variables with a real zero ("ratio-scale variables") and variables
with an arbitrary zero ("interval-scale variables"), and asks whether
the form of the model will change if we add arbitrary constants to
those variables whose zero is arbitrary. If the form changes, then any
term that was affected must not be dropped. In the social science,
where most of the variables have arbitrary zeros, the rule of thumb
usually suffices. In the physical sciences the actual rule should
probably be used. You don't say what units Temperature and Pressure
are in, but if Temperature is in C or F, whose zeros are arbitrary,
then you have to ask what happens if you add a constant to
Temperature. If Temp*Press is in the model, then adding a constant to
Temperature will create a term that looks like a Pressure main effect,
so you must keep Pressure in the model. (Do the algebra.)
In my experience (mainly with psychologists), it's generally been
better to approach regression models with interactions from the top
down rather than the bottom up, looking first at the full model and
asking whether the interactions are necessary, then considering
dropping only those main effects that are not involved in any of
the kept interactions.
A picky point: In your discussion of the results for the uncentered
data, you say "Also note that due to variance inflation the t-ratios
for X1 and X12 are marginal at the p = 0.05 level. Compare these t-
ratios to the earlier model." Did you really mean to say X12? As you
note a few lines on down, the t-ratio etc for the interaction is
invariant under centering, so you can't blame the marginality of its
significance on not centering.
You talk about coefficients that are "reversed from reality" and
"backward". I assume you're referring to the effects of coding, not
sampling error. When the variables are less obviously related than
house sizes and prices, how do you know whether a coefficient is
backward or not? What is "reality"?
However, the real problem is more general.
As a rule I say "if you are entertaining any kind of second-order
effects (interactions or quadratics) then center the variables.
This is not a negotiable."
OMU
That rule will protect some people from the effects of their own
ignorance, provided they never use three-way or higher interactions.
But if they do go beyond two-way interactions then they're worse off
than if there were no rule, because if they were inexpert enough to
need the rule then they will almost certainly be inexpert enough to
not know its limitations and will confidently misinterpret the results
of models with higher interactions. Wouldn't it be better to teach
them analytic habits that will always work:
(1) To NOT interpret the main effect coefficient of a variable that is
involved in an interaction as a measure of the overall effect of that
variable.
(2) To NOT interpret the significance test of the main effect
coefficient of a variable that is involved in an interaction as
indicating whether that term should be kept in or dropped from the
model.
(3) To look at the average slope if they need to say something about
the overall effect of a variable.
.
- Follow-Ups:
- References:
- interaction term in linear regression with a dummy coded predictor
- From: kj
- Re: interaction term in linear regression with a dummy coded predictor
- From: RichUlrich
- Re: interaction term in linear regression with a dummy coded predictor
- From: Ray Koopman
- Re: interaction term in linear regression with a dummy coded predictor
- From: Old Mac User
- Re: interaction term in linear regression with a dummy coded predictor
- From: Ray Koopman
- Re: interaction term in linear regression with a dummy coded predictor
- From: Old Mac User
- Re: interaction term in linear regression with a dummy coded predictor
- From: Ray Koopman
- interaction term in linear regression with a dummy coded predictor
- Prev by Date: Re: Variance of a RV
- Next by Date: Re: How to Compare the Two Non-Normal Datasets?
- Previous by thread: Re: interaction term in linear regression with a dummy coded predictor
- Next by thread: Re: interaction term in linear regression with a dummy coded predictor
- Index(es):
Relevant Pages
|