Re: Quadratic weighted Kappa and the Intraclass Correlation Coefficient



After some computational investigations, it appears that weighted
kappa, using quadratic (Fleiss-Cohen) weights, asymptotically
approaches the ICC(2, 1) as N becomes large.

The ICC(2, 1) is the Case 2 ICC estimating the reliability of a single
rater.

The Case 2 ICC assumes that the two raters compared are a random
sample from a population of raters, and estimates the reliability of
any randomly sampled pair of raters from this population.

To me this is unexpected, because the "chance agreement" term of
weighted kappa assumes that the two raters considered are the only
ones; thus one might expect that weighted kappa would correspond to
the ICC(3, 1), which does not generalize from the two raters to a
larger population of raters.

It might be something of an algebraic coincidence that weighted kappa
corresopnds to the ICC(2, 1). In any case, it is a potentially useful
result, since often we wish to estimate, from two given raters,
reliability in the overall rater population.

Of course, that is not always true. For example, if one wishes to
compare two automated diagnostic procedures, one is usually interested
only in agreement between these two procedures, not between any pair
of procedures drawn from the population from which these two are a
random sample.

Note that in SAS the Fleiss-Cohen weights are *not* the default; you
must request them with the "agree (WT=FC)" option, as in the following
example:

proc freq data = <data> ;
tables rater1 * rater2 / norow nocol nopercent agree (WT=FC) ;
output agree out=stats;

* include significance tests ;

test kappa wtkap ;
run;

The good folks at the Ulm Medical School (Germany) have placed a
helpful ICC calculator online, found here:

http://sip.medizin.uni-ulm.de/informatik/projekte/Odds/icc.html

John Uebersax PhD
.



Relevant Pages

  • Re: sample size for kappa?
    ... Kappa is not a very good 'absolute' statistic beyond the 2x2 case. ... raters and a audit tool/questionnaire that has 250 ... know if there is inter rater agreement between the 2 raters on the 250 ... agreement in measuring a single 'dimension' that is shared ...
    (sci.stat.consult)
  • Re: comparing Kappa Statistics in case of dependence
    ... we would like to compare different kappa statistics. ... Here is the simple situation for 3 raters, ... Z disagrees with Y ...
    (sci.stat.math)
  • Re: kappa & ICC questions
    ... Can the data from more than one pairs of raters in an inter-rater ... reliability study ... There exists a creature called a multi-rater kappa. ... rated by each rater in the test-retest reliability study ranges from 9 ...
    (sci.stat.edu)
  • sample size for kappa?
    ... I am doing a study assessing an audit tool. ... raters and a audit tool/questionnaire that has 250 ... because we are interested if there is a difference in kappa between the ... know if there is inter rater agreement between the 2 raters on the 250 ...
    (sci.stat.consult)