Re: observations in different scales
From: Richard Ulrich (Rich.Ulrich_at_comcast.net)
Date: 07/09/04
- Next message: Richard Ulrich: "Re: BASIC Q: Why not use median-based std deviation?"
- Previous message: Osher Doctorow: "Quantum Entanglement Explained by Jacobson Radical + PI"
- In reply to: Sergey Tarima: "Re: observations in different scales"
- Next in thread: John Uebersax: "Re: observations in different scales"
- Reply: John Uebersax: "Re: observations in different scales"
- Messages sorted by: [ date ] [ thread ]
Date: Thu, 08 Jul 2004 22:07:50 -0400
- My replies and comments -
On Wed, 7 Jul 2004 15:02:53 +0000 (UTC), stari@ms.uky.edu (Sergey
Tarima) wrote:
> On Tue, 06 Jul 2004 21:34:53 -0400, Richard Ulrich wrote:
>
[ snip. On reporting averages... ]
RU >
> >you can show how sensitive it is to low and
> >high assumptions about the "one or more".
ST >
> What assumption are you talking about?
> Prior on the observations "one or more" or smth else?
You might want to debrief your interviewers, to learn their
opinions. (I've been impressed more than once.)
Was "one or more" a hesitation to elect between 1 and 2,
or was it coyness, an unwillingness to put 10 or 100?
Was it the same few subjects, throughout?
You might want to look at other responses by the same
subjects, in the same manner that you investigate Missing -
You need to see if the other data on hand do support
'missing randomly' or '1+ randomly' across subjects --
a critical audience will not accept that you merely
assumed this.
RU >
> >Anything you find will be tied intrinsically to the subject
> >matter, more than you have described.
ST >
> Of course it is! We have more then 100 questions in the interviews
> but (for good or not) in our report we put basically frequencies on
> these variables. Only once I had to run logistic regression.
If this is the dependent variable, it is easily 0 vs 1+ for the
logistic. Based on the data listed below? If I were using this
one as an independent variable, I certainly would not use it
in its raw form: 73% are zero, 4% are above 10, up to 96, and 5%
are "1+". I might try 0/1-10/11+ as categories, with "1+"
experimentally placed in each of the latter categories.
> Hence, we can neglect all the differences among the interviewed
> subjects and care only about one variable (which have the observations
> in a mixture of continuous and categorical scales).
>
I don't follow that sentence.
RU >
> > This is a variety of "missing" -- and the detail is (probably)
> > not "missing at random".... You could search for that term.
ST >
> Glad you metioned this word "missing". I am really interested in the
> topic. And in the case of "0" or "1 or more" observations I can think
> about them as they are "censored" observations
> and use Kaplan-Meier estimator. This is the approach in my mind.
Where I know "Kaplan-Meier" and what I find with Google, KM
is for survivorship analyses. It is not possible to usefully treat
occurrences of abuse as time-to-relapse, I don't think.
[snip, brief exchange]
>
> Look at the data example:
>
> "How many incidents involving psychological abuse happened in the past
> 12 months?"
>
> Cumulative Cumulativ
> psych_num Frequency Percent Frequency Percent
> ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
> 0 774.831 73.15 774.831 73.15
> 1 51.822 4.89 826.653 78.04
> 2 58.1 5.49 884.753 83.53
> 3 18.496 1.75 903.249 85.27
[ ... 4 - 8 ]
> 10 10.445 0.99 964.629 91.07
[ ... 12- 75]
> 96 0.734 0.07 991.231 93.58
> 97 57.496 5.43 1048.727 99.01
>
> In the data the code "97" means "one or more but do not know how
> many". The responses "Dont know" or "refuse to answer" I excluded
> assuming that they were missing completely at random.
See above - do not just assume. But you might conclude that,
anyway.
[ ...]
RU >
> >Finally, the model depends on who wants to do what with the data.
I repeat: Who wants to know what, or conclude what, using these
numbers?
> It is not clear what (parametrical) model should be used with the
> data. That is why Kaplan-Meier seems to be a good solution.
It could be a failure of imagination, but I can't imagine KM
being right. You could approximate a survival metric by taking
the reciprocal of the numbers, or otherwise reversing them. I
don't know if that will be any more efficient that simply assigning
mid-ranks in the conventional manner, and using Ranks in your
analysis. That still leaves a question of where to put the 1+ group,
which is fully a fifth of your non-zero data.
-- Rich Ulrich, wpilib@pitt.edu http://www.pitt.edu/~wpilib/index.html
- Next message: Richard Ulrich: "Re: BASIC Q: Why not use median-based std deviation?"
- Previous message: Osher Doctorow: "Quantum Entanglement Explained by Jacobson Radical + PI"
- In reply to: Sergey Tarima: "Re: observations in different scales"
- Next in thread: John Uebersax: "Re: observations in different scales"
- Reply: John Uebersax: "Re: observations in different scales"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|