Re: Binomial dta: how to handle don't-cares?
- From: "Old Mac User" <chendrixstats@xxxxxxxxx>
- Date: 8 Feb 2007 09:09:28 -0800
Several people have given good advice including "the numbers are what
they are and should be reported as such". This is politics more than
science. In polls of this sort there are always issues about "was
this a random sample". Meaning... were those who responded a
representative sample of the entire population of those who could have
responded? (People who have an issue are more likely to respond,
etc.)
Recognizing all of this and more, two days ago I promised I would post
a "95% confidence envelope". That was supposed to happen yesterday
morning. A balky furnace and a "funny-acting" hot water heater
distracted me... especially since we are experiencing what surely is a
case of global cooling combined with snow this week. Both systems are
fixed, so here we go.
Political statement: About 30 years ago Global Cooling was "the big
concern". Now it's "Global Warming". Here's a big thank you to all
those who bought SUVs and built larger homes and in doing so saved us
from crop failures in the midwest.
So out of a population of 1366 who had an opportunity to vote, 119
said Yes, 232 said No, and 29 were neutral. With all of the
aforementioned concerns, the task here is to (with caution) use data
from this responding sample of 380 to infer something about the entire
population of 1366.
What begins as a simple "binomial" Yes or No usually turns into a
trinomial Yes, No, Indifferent outcome. This is always a source of
concern and frustration. I'm going to post a "trinomial analysis" here
for better or worse.
Before I do this, there is still one more caution. This "answer" is
based on an assumption that must be documented right here. That
assumption is...
"the total number of responders (380) is 'small' relative to the total
population (1366). Or, if you prefer... the population is
'infinite'". This same flawed assumption in imbedded in a "binomial
analysis" of this sort unless we are willing to deal with a
hypergeometric distribution.
Here are some calculations done with software designed to address the
matter of Yes/No/Indifferent (trinomial). the outcome of this
analysis if a 95% confidence envelope. Well, almost 95%. I'll explain
that later.
The next eight rows document the data and the setup for analysis.
N No Yes
380 232 119
FractNo 0.611
FractYes 0.313
DP and DQ 0.001 0.001
Target prob contour = 0.950
Actual prob contour = 0.935
Number of rows in the table = 104
The following table shows the approx. 95% confidence envelope.
To see this, set up coordinates on ordinary rectangular graph paper.
Botjh scales range from 0 to 1 and represent the fraction favoring and
not favoring the proposal.
The vertical axis is for "Yes" and the horizontal axis is for "No".
Now go to the vertical or "Yes" axis = 0.313 and the horizontal axis
or No = 0.567. Put a small circle there.
Then go to "Yes" = 0.313 and No. = 0.643. Put a small circle there.
Do the same for each value of "Yes" (on the vertical scale) in the
table.
Skip some of them to taste... there are more here than you need.)
Notice that for each value of Yes there are two values of No. The
column "Diff" is the width of the ellipsoid for each value "Yes" value
on the vertical axix.
On completion of this graphic you have the (approx.) 95% confidence
envelope. "The prob is 0.935 that this envelope encompasses the
'true' value of the fraction of Yes and the fraction of No in the
total population." (Again, we are using a classic trinomial here when
a hypergeometric would be more accurate... it's the "sample is small
relative to the population" thing.)
The top "half" of the ellipsoid is completed at Row 39. The bottom
"half" begins at Row 40.
Notice that the cited confidence envelope is actually for 93.46%, not
95.00%.
I'll try to explain this later. But it's close enough for this
discussion.
Does the (approx.) 95% confidence envelope encompass Yes/No =
0.50/0.50? No. By inspection, examine Row 39...
39 0.351 0.543 0.603 0.060 0.00459 0.44806 <--
The the top of the ellipsoid Yes = 0.351 and No is 0.543 and 0.603
0.351 is a long way from 0.500.
This ellipsoid is very elongated (consistently narrow) because there
were so few (small fraction of) Neutral responders. If there had
been, for instance, 150 Neutral responders then the ellipsoid would be
much more circular.
This is surely a case of using high precision software to calculate a
low precision confidence envelope. I say "low precision" because of
the reasons cited by others combined with using an "exact trinomial
calculation" where a hypergeometric would be more proper.
This software was written to deal with a similar but slightly
different type of data. That is, a "random sample of" N people are
presented with two similar products (call them A and B) and asked
which of those they favor. The answers must be "favor A, favor B, or
No Preference". From this we want to calculate a confidence envelope
on the entire population of "people who would have an interest in this
type of product". The actual calculations are complex and slow even
on the fastest desktop computers. There is an element of "trial and
error" in converging on the 95% confidence envelope so I stopped it a
bit short and settled on the 93.5% confidence envelope.
This same software is also used for another trinomial situation in
which we compare products A, B, and C and the testers must pick one of
these... no "No Preferences" allowed.
The last column in the table is the accumulated probability...
totalling 0.93455. The last column is there just in case it's needed
for another purpose.
This is a long and somewhat complicated post. If you find obvious
typos or other things that need help, please let me know.
Be of good cheer... OMU
Row Yes No1 No2 Diff Prob CProb
1 0.313 0.567 0.643 0.076 0.01676 0.01676
2 0.314 0.567 0.642 0.075 0.01671 0.03347
3 0.315 0.566 0.641 0.075 0.01664 0.05012
4 0.316 0.565 0.640 0.075 0.01655 0.06666
5 0.317 0.564 0.640 0.076 0.01643 0.08310
6 0.318 0.563 0.639 0.076 0.01628 0.09938
7 0.319 0.563 0.638 0.075 0.01609 0.11547
8 0.320 0.562 0.637 0.075 0.01589 0.13136
9 0.321 0.561 0.636 0.075 0.01566 0.14702
10 0.322 0.560 0.635 0.075 0.01541 0.16243
11 0.323 0.560 0.634 0.074 0.01513 0.17755
12 0.324 0.559 0.633 0.074 0.01483 0.19239
13 0.325 0.558 0.632 0.074 0.01452 0.20691
14 0.326 0.558 0.631 0.073 0.01418 0.22109
15 0.327 0.557 0.630 0.073 0.01383 0.23492
16 0.328 0.556 0.629 0.073 0.01347 0.24839
17 0.329 0.556 0.628 0.072 0.01309 0.26149
18 0.330 0.555 0.627 0.072 0.01271 0.27419
19 0.331 0.554 0.626 0.072 0.01231 0.28651
20 0.332 0.554 0.625 0.071 0.01190 0.29841
21 0.333 0.553 0.623 0.070 0.01149 0.30990
22 0.334 0.552 0.622 0.070 0.01108 0.32098
23 0.335 0.552 0.621 0.069 0.01066 0.33164
24 0.336 0.551 0.620 0.069 0.01024 0.34188
25 0.337 0.551 0.619 0.068 0.00982 0.35170
26 0.338 0.550 0.618 0.068 0.00940 0.36110
27 0.339 0.549 0.617 0.068 0.00899 0.37009
28 0.340 0.549 0.616 0.067 0.00858 0.37867
29 0.341 0.548 0.615 0.067 0.00818 0.38685
30 0.342 0.548 0.614 0.066 0.00777 0.39462
31 0.343 0.547 0.613 0.066 0.00739 0.40201
32 0.344 0.547 0.612 0.065 0.00700 0.40901
33 0.345 0.546 0.610 0.064 0.00662 0.41563
34 0.346 0.546 0.609 0.063 0.00625 0.42188
35 0.347 0.545 0.608 0.063 0.00590 0.42778
36 0.348 0.545 0.607 0.062 0.00556 0.43334
37 0.349 0.544 0.606 0.062 0.00523 0.43856
38 0.350 0.544 0.605 0.061 0.00490 0.44347
39 0.351 0.543 0.603 0.060 0.00459 0.44806 <--
40 0.312 0.568 0.644 0.076 0.01677 0.46483
41 0.311 0.569 0.645 0.076 0.01675 0.48158
42 0.310 0.570 0.646 0.076 0.01670 0.49827
43 0.309 0.571 0.647 0.076 0.01662 0.51489
44 0.308 0.572 0.648 0.076 0.01651 0.53139
45 0.307 0.573 0.649 0.076 0.01637 0.54776
46 0.306 0.573 0.650 0.077 0.01621 0.56397
47 0.305 0.574 0.651 0.077 0.01601 0.57998
48 0.304 0.575 0.652 0.077 0.01579 0.59577
49 0.303 0.576 0.652 0.076 0.01554 0.61131
50 0.302 0.577 0.653 0.076 0.01527 0.62659
51 0.301 0.578 0.654 0.076 0.01498 0.64156
52 0.300 0.579 0.655 0.076 0.01466 0.65623
53 0.299 0.580 0.656 0.076 0.01433 0.67055
54 0.298 0.581 0.657 0.076 0.01398 0.68453
55 0.297 0.582 0.658 0.076 0.01361 0.69814
56 0.296 0.583 0.658 0.075 0.01321 0.71135
57 0.295 0.584 0.659 0.075 0.01282 0.72417
58 0.294 0.585 0.660 0.075 0.01241 0.73657
59 0.293 0.586 0.661 0.075 0.01199 0.74856
60 0.292 0.587 0.662 0.075 0.01156 0.76012
61 0.291 0.588 0.662 0.074 0.01112 0.77124
62 0.290 0.589 0.663 0.074 0.01068 0.78192
63 0.289 0.591 0.664 0.073 0.01023 0.79215
64 0.288 0.592 0.665 0.073 0.00979 0.80194
65 0.287 0.593 0.665 0.072 0.00935 0.81129
66 0.286 0.594 0.666 0.072 0.00891 0.82020
67 0.285 0.595 0.667 0.072 0.00847 0.82867
68 0.284 0.596 0.668 0.072 0.00805 0.83672
69 0.283 0.598 0.668 0.070 0.00761 0.84432
70 0.282 0.599 0.669 0.070 0.00719 0.85152
71 0.281 0.600 0.670 0.070 0.00679 0.85831
72 0.280 0.601 0.670 0.069 0.00639 0.86470
73 0.279 0.603 0.671 0.068 0.00600 0.87069
74 0.278 0.604 0.672 0.068 0.00562 0.87632
75 0.277 0.605 0.672 0.067 0.00526 0.88158
76 0.276 0.606 0.673 0.067 0.00491 0.88649
77 0.275 0.608 0.674 0.066 0.00457 0.89106
78 0.274 0.609 0.674 0.065 0.00425 0.89531
79 0.273 0.611 0.675 0.064 0.00393 0.89924
80 0.272 0.612 0.675 0.063 0.00364 0.90288
81 0.271 0.613 0.676 0.063 0.00336 0.90624
82 0.270 0.615 0.676 0.061 0.00309 0.90932
83 0.269 0.616 0.677 0.061 0.00284 0.91217
84 0.268 0.618 0.677 0.059 0.00259 0.91475
85 0.267 0.620 0.678 0.058 0.00236 0.91711
86 0.266 0.621 0.678 0.057 0.00215 0.91927
87 0.265 0.623 0.679 0.056 0.00196 0.92122
88 0.264 0.624 0.679 0.055 0.00177 0.92299
89 0.263 0.626 0.679 0.053 0.00160 0.92459
90 0.262 0.628 0.680 0.052 0.00144 0.92603
91 0.261 0.630 0.680 0.050 0.00128 0.92731
92 0.260 0.632 0.680 0.048 0.00114 0.92845
93 0.259 0.633 0.680 0.047 0.00102 0.92947
94 0.258 0.635 0.681 0.046 0.00091 0.93038
95 0.257 0.637 0.681 0.044 0.00080 0.93118
96 0.256 0.640 0.681 0.041 0.00069 0.93188
97 0.255 0.642 0.681 0.039 0.00060 0.93248
98 0.254 0.644 0.680 0.036 0.00051 0.93299
99 0.253 0.647 0.680 0.033 0.00043 0.93342
100 0.252 0.649 0.680 0.031 0.00037 0.93379
101 0.251 0.652 0.679 0.027 0.00029 0.93408
102 0.250 0.655 0.678 0.023 0.00023 0.93431
103 0.249 0.659 0.676 0.017 0.00016 0.93447
104 0.248 0.664 0.674 0.010 0.00008 0.93455
On Feb 8, 7:08 am, Stan Brown <the_stan_br...@xxxxxxxxxxx> wrote:
Tue, 06 Feb 2007 23:41:48 -0500 from Richard Ulrich
<Rich.Ulr...@xxxxxxxxxxx>:
On Mon, 5 Feb 2007 16:52:33 -0500, Stan Brown
<the_stan_br...@xxxxxxxxxxx> wrote:
Survey taken: 1366 mailed out (proposed sewer system)
Responses received: 380
119 "yes"
29 neutral
232 "no"
On a null hypothesis of "opinion is evenly divided" I get a tiny p-
value no matter whether I count "yes" as 119 out of 380 or 119 out of
380-29 = 351. But I wonder what is the right thing to do. In yes/no
surveys, when you have don't-care responses, how are they best
treated?
Okay, there is a preference for No, regardless of what
you do with "neutral." The presentation is more a matter
of politics and of sense, than of statistics.
Who has been campaigning how strongly, for what?
There hasn't been much of a campaign. The town board has been
debating this off and on since before I bought my house last summer.
I think developers probably want it so they can build with higher
density; much of the town is rural and the rest is low-density
suburban. We're 10-20 minutes outside of Ithaca.
- Is this a 'random' sample, or was there any chance that
one side is using the survey as a tool?
There's a chance, but AFAIK a copy was sent to every homeowner in the
proposed district. Reading my copy, I couldn't tell whether the town
supervisor, who prepared it, was hoping for yeses or noes.
It was a one-page summary, including state grant figures and cost
figures, with the survey at the bottom to be mailed in. (No stamp was
provided, making the response rate even more amazing.)
Was this question the whole content of the survey?
Yes,
"My opinion:
"___ I support the proposal
"___ I am neutral toward the proposal
"___ I oppose the proposal."
Definitely, state the full results - all three categories.
Right. I guess I should have made it clear, I have no connection with
this other than as a homeowner who stands to see a $100-a-month rise
in my tax bill *and* the privilege of making expensive connections
and then paying additional user fees.
The ethics of survey-reporting says that you must be explicit
about the context, the content, the questions, etc.
I agree with you that of course the full numbers should be prevented;
my question was about proper drawing of conclusions.
It seems obvious that opinion is quite strongly "no", but I'm looking
at how to frame a proper hypothesis test and p-value.
Alternatively, maybe I should make it a 95% confidence interval. Do I
calculate a binomial CI from a sample size of 380-29, excluding the
neutrals as though they had not responded? Or does the three-way
nature of the question mean that I can't analyze it in a binomial
manner and have to do some sort of Chi-squared?
--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/- Hide quoted text -
- Show quoted text -
.
- Follow-Ups:
- Re: Binomial dta: how to handle don't-cares?
- From: Stan Brown
- Re: Binomial dta: how to handle don't-cares?
- References:
- Binomial dta: how to handle don't-cares?
- From: Stan Brown
- Re: Binomial dta: how to handle don't-cares?
- From: Richard Ulrich
- Re: Binomial dta: how to handle don't-cares?
- From: Stan Brown
- Binomial dta: how to handle don't-cares?
- Prev by Date: Graphical Model vs Random Graph
- Next by Date: stepwise regression, interpretation of results
- Previous by thread: Re: Binomial dta: how to handle don't-cares?
- Next by thread: Re: Binomial dta: how to handle don't-cares?
- Index(es):
Loading