Re: kappa & ICC questions



Hello Gay,

You are describing a common problem that has not been sufficiently
well addressed in the literature.

Let me try to address your second question:

(2) Can I reasonably use ICC on ordinal data from behavior ratings of
Poor, Fair, Good, Very Good (corresponding 1,2,3,4 response choices
are used on one measure, but no numbers are used on the 2nd measure).
I have read that it would be appropriate at least for the first
measure.

I believe that any time one can use weighted kappa with ordinal
ratings one can also use the ICC (and, since the ICC is more flexible,
that might be the way for you go here).

The reasoning is as follows: when one employs weighted kappa with
ordinal ratings, then either (a) one must supply weights, or (b) the
computer program (e.g. SAS proc freq) calculates these weights
automatically. In the latter case, the weights asssume that rating
categories (e.g., poor, fair, good, etc.) are evenly spaced (i.e.,
that they can be represented by consecutive integers like 1, 2, 3,
4). Thus, coding the responses using consecutive integers and
computing the ICC would entail no more assumptions than letting SAS
compute weighted kappa.

In the case of (a), in theory one could scale the rating categories
differently -- but one would usually need to stipulate these values to
compute the weights for weighted kappa; but if one can stipulate these
values, one can replace the ordered category ratings with these values
and again compute the ICC.

So in any case, if one can use weighted kappa for ordered category
ratings, one can use the ICC.

It isn't clear whether you used the same rating categories on your two
measures, or what the relationship between the two measures is. If
you are analyzing the measures separately, then there is no obvious
problem: if the rating levels for the second measure were like Poor,
Fair, Good, and Very Good and there were not explicit integer labels,
I believe it would be okay to treat them as 1, 2, 3, 4 and analyze
them that way. That's not guaranteed to be optimal -- perhaps (Fair -
Poor) > (Good - Fair). But if one were going to use weighted kappa the
assumption of equally spaced categories would be made implicitly, and
all medical journals seem to accept this without question. In this
case I generally argue that people intutively design scales such that
the rating levels are at least roughly evenly spaced.

It is true that missing observed combinations pose a problem for
calculating weighted kappa with SAS proc freq, but this can, in
theory, be handled by using certain 'tricks':

http://ourworld.compuserve.com/homepages/jsuebersax/saskappa.htm

However my suggestion would be to first see if you can analyze your
data using the ICC. You would have, if I understand the design
correctly, a 10 x N (Rater by Case) design for each measure, with many
missing Rater x Case combinations (i.e, the design is not 'fully
crossed'). You might need to get some help figuring out the correct
df because of the missing values, but possibly this wouldn't be too
difficult.

John Uebersax PhD
.



Relevant Pages