Re: Categorical Data Help



mkalyan79@xxxxxxxxx wrote in news:1134234179.853474.130190
@g47g2000cwa.googlegroups.com:

> I had a query related to categorical data analysis,
>
> As in case of numeric data we have mean and covariance. How does this
> map to categorical data. Does categorical data have a mean ( or it is
> really the mode) and what in case of covariance.

For interval-level data, the mean is a measure of central tendency, the
variance is a measure of dispersion, and the covariance is a measure of
association. They are usually (but not always) the favored measures of
these parameters.

>
> I presume these definition apply to even catergorical data, so if there
> is a huge dataset with categorical data in it , how should we go about
> calculating the mean(mode) for categorical data and the covariance of
> the data.
>
> For numeric data as there are standard forumales to do that.

With categorical data there are no universally-accepted measures of
central tendency, dispersion, and association; which ones get used really
depends on the field you're working in. For central tendency, the mode
is pretty much all there is. For dispersion, the most common measures
are:

1) The proportion of observations outside the modal category.
2) The probability that two randomly chosen observations are in different
categories (Simpson's Index of Diversity is a normalized form of this).
3) The entropy of the categories, a measure of the amount of information
(in the formal sense) conveyed by specifiying an observation's category.

Measures of association tend to be:
1) Functions of the chi-square statistic for two-way contingency tables;
Cramer's V is the most common, though Pearson's contingency coefficient
is still sometimes used. These are symmetric, with no notion of
independent or dependent variables.
2) "Proportional reduction in error" measures like Goodman and Kruskal's
lambda and Thiel's U. These basically apply the measures of dispersion
above to both the marginal and conditional distributions of one of the
variables (thus they're assymmetric) and quantify the amount of reduction
in dispersion achieved by knowing the category of the independent
variable.
3) Measures based on canonical correlation, particularly in the context
of correspondence analysis which is a primarily graphical technique for
presenting the association between two categorical variables.

For more information, see either of Alan Agresti's two books on
categorical data analysis.
.



Relevant Pages

  • Re: Categorical Analysis
    ... > As in case of numeric data we have mean and covariance. ... > map to categorical data. ... Does categorical data have a mean (or it is ...
    (sci.stat.math)
  • Re: Categorical Data Help
    ... > I had a query related to categorical data analysis, ... > As in case of numeric data we have mean and covariance. ... > For numeric data as there are standard forumales to do that. ...
    (sci.stat.consult)
  • Categorical Analysis
    ... I had a query related to categorical data analysis, ... As in case of numeric data we have mean and covariance. ... map to categorical data. ... For numeric data as there are standard forumales to do that. ...
    (sci.stat.math)
  • Categorical Data Help
    ... I had a query related to categorical data analysis, ... As in case of numeric data we have mean and covariance. ... map to categorical data. ... For numeric data as there are standard forumales to do that. ...
    (sci.stat.edu)
  • Categorical Data Help
    ... I had a query related to categorical data analysis, ... As in case of numeric data we have mean and covariance. ... map to categorical data. ... For numeric data as there are standard forumales to do that. ...
    (sci.stat.consult)