Re: power analysis: within-subjects, 2x2, random factors: How ?

From: Martin O'Hare (intersity_at_hotmail.com)
Date: 06/28/04


Date: Mon, 28 Jun 2004 15:58:50 +0100

Richard,

Thanks for your message ! As regards the experiment, all the details were
included in the paper and the presentation was judged as perfectly clear by
the people who read it. The selection for tests and design was made
according to a previous study (my study was essentially a re-run of an other
study with one more crucial factor added). For the by-subjects and by-items
analyses there is no "physical" reorganization of the data - only the model
changes. For all tests I use GLM with alpha set to .05.

As it happens I'm happy that there was no reliable effect found for the 1
out of the 2 IVs. The result supports my predictions and clearly questions a
previous study, which I found to be problematic because of poorly controlled
material. So my guess is that one of the reviewers was not very happy with
all these (unfortunately, biases and "friendships" are not so rare in
academia) and decided question the power of my test.

Yes, you are right, the key is to simplify. I have all the results from the
ANOVAs: SD, MSe, M, etc. and I can probably get some "indication" if I
combine these in a meaningful and do the contrast test you suggested. Of
course, I still don't know how to do a power-analysis with my (quite
complicated) design - as you said repeated measures is tough to describe
power analyses for - and I find confusing why you, as a reviewer, would
still ask for something like that instead of asking directly for
clarifications, if something is not clear enough, especially when you know
that post-hoc power test is bogus.

Best,
-- Martin

"Richard Ulrich" <Rich.Ulrich@comcast.net> wrote in message
news:cu8vd0t60g3kbmlvuh2jo9ul101boo9i5l@4ax.com...
> On Mon, 28 Jun 2004 00:38:35 +0100, "Martin O'Hare"
> <intersity@hotmail.com> wrote:
>
> > Dear all,
> >
> > One of the reviewers of an article I submitted to a journal asked about
some
> > results of an experiment I reported as not reliable whether the design
had
> > enough power to detect such a difference, if one existed, and asked for
a
> > power test. I'm quite confused because I've read (and been convinced)
that
> > post-hoc power analysis is bogus. On the other hand it seems that I have
to
> > do it (proposed interval analysis but the editorial board didn't get
it/like
> > it). I am not statistician and my knowledge of statistics is limited to
the
> > basics. I'm not sure how to do a power analysis with the design I have.
>
> Okay, I haven't see their criticism. And I haven't see what
> you have actually provided them, in multiple pages and tables.
>
> I've got guesses.
> One: They were confused by whatever you did, and aiming you
> at 'power' was a best-guess for what might help clarify your
> procedure, because you should have been averaging scores
> instead of using 100 items. I say that because I certainly
> am confused. And I know that I have tendered "suggestions"
> as a reviewer, without wanting to impose that requirement on
> the authors -- I try to show what it was that *thought* they
> were aiming at, or how to reach it.
>
> Two: You don't say that you average the 100 items into cells,
> or any other way. That bothers me, because it suggests to me
> that you could be doing the wrong tests. "Power" might be
> the issue, if you are testing with the irrelevant variability
> between "items" as the error term.
>
> Three: When you put scores into the framework for an item
> analysis, you have to lay out the numbers in a way that you
> can see what the 'effect size' is that you have on hand, and
> what effect size was needed for achieving statistical significance.
> When you look at your data that way, does it make *sense*?
>
> Are you merely achieving effects that are small, and everyone
> would agree? Or do they look big enough to be useful,
> but (for whatever reason) they don't test out?
>
> But it seems to me, my criticism might be hastily stripped down
> to match theirs, "asking for a power test" - even though I agree
> that "post-hoc power analysis is bogus" is a pretty fair statement.
>
> I'm suggesting that "there is something wrong with the
> presentation" and it is probably related to power; but giving
> them a formal power analysis is not the end of it. The start
> of it is what matters, where you show that the outcome is
> 'small' in everyone's terms. Or you fix the analyses by
> averaging scores to get a better criterion, or by testing
> simple contrasts against their proper errors.
>
> >
> > I had about ~25 subjects (same number to other studies in the field
> > examining similar stuff), all of them examined 100 items, DV was
continuous,
> > items were balanced across two binary 2-level IVs. Here are some more
> > details:
> >
> > Factor 1 and Factor 2 were crossed in a 2x2 within-subjects design. The
two
> > within-subjects factors were Factor 1 (level 1 or level 2) and Factor 2
> > (level 1 or level 2) and the analysis was carried out by-subjects (F1)
and
> > by-items (F2). For the by-subjects analysis Factor 1 and Factor 2 were
> > treated as fixed factors whereas subjects was the random factor. For the
> > by-items analysis, items nested within the compound type and familiarity
> > conditions were the random factor.
> > All terms and possible interactions between them were included in the
model.
>
> Items nested? Is that crucial?
> All terms and interactions? I see an important 2x2 design,
> with an important interaction. The between-subject variation
> in a contrast is the useful error term for the within-subject effect
> - just like a paired t-test.
> Does a simple test get anything?
>
>
> > General Linear Model's regression was used for the analysis of variance
of
> > means.
> >
> > What I should do ? I looked high and low for software and formulae. I
found
> > formulae for different designs but I find it difficult to adapt them to
my
> > problem and do not want to do something wrong or based on "intuition".
> >
> > Any help will be greatly appreciated !
>
> Repeated measures is tough to describe power analyses for.
>
> The key is to simplify. What is the contrast? What is the
> error term? The power analysis for a contrast of two groups
> is most intelligible as the paired t-test: Which is: the one-sample
> t-test on the difference in means. For that, you want the
> difference, and you want the standard deviation of the
> difference, across subjects; you might *estimate* that
> from the raw standard-deviation between subjects, for
> one set of 50 items (if I've reconstructed the experiment right),
> and the average correlation between sets of 50 items.
>
>
> The ANOVA analysis that I described earlier is the basis for
> the power analysis, so you want to get these means, even
> if you have justification in your head for later testing
> something about the items.
>
> Hope this helps. If I missed the design too far, try again.
>
> --
> Rich Ulrich, wpilib@pitt.edu
> http://www.pitt.edu/~wpilib/index.html



Relevant Pages