Re: interpreting poll margin of error?
From: Eugene Gallagher (Nospam54_at_aol.com)
Date: 10/29/04
- Next message: demography: "Find Life Table (actuarial method) estimate with R or S-Plus"
- Previous message: Rod: "Re: asking suggestions on the algorithm in compution.Thanks."
- In reply to: Bruce Weaver: "Re: interpreting poll margin of error?"
- Messages sorted by: [ date ] [ thread ]
Date: Fri, 29 Oct 2004 11:52:38 -0400
Bruce Weaver wrote:
...
> Rich Ulrich has given you a good response. Just thought I'd also
> mention that Eugene Gallagher posted an item called "On a recent poll"
> in sci.stat.edu not too long ago that may be of interest to you. A
> Google groups search on that term should lead you to it.
>
> You might find the American Statistical Association's Survey Research
> Methods Section of interest too. See the "What is a survey" link on
> this page, for example:
>
> http://www.amstat.org/sections/SRMS/index.html
>
>
I often find the significance of poll results misreported. For example,
Gallup this week described their latest poll of likely voters indicating
a ‘statistically significant’ 5-point lead for Bush.
Gallup’s 10/26/04 Press release has the race at 51-46% Bush with
their sample size of 1195. The margin of error for a poll of size 1195
for a single proportion is 2.8%. But the margin of error for the 5%
difference is 5 ± 5.6%. The probability of observing this 5% difference
by chance is 8.1%, certainly higher than the 5% that Gallup uses to
assess statistical significance in the rest of their polling.
I use a Matlab m.file that I wrote called pollsig.m to calculate the
significance of differences in proportions for polls. The margin of
error above was calculated with about 1 million Monte Carlo trials (but
50,000 would have been more than sufficient). I’ve included it below. As
reported previously in this thread, for pairs of proportions close to
50%, the margin of error for the difference is just slightly less than
twice the individual margin of error.
The misleading Gallup press release:
http://www.gallup.com/poll/content/?ci=13792
Bush Retains Lead Among Likely Voters, Now 51% to 46%
Contest essentially tied among registered voters
by David W. Moore
GALLUP NEWS SERVICE
PRINCETON, NJ -- President George W. Bush holds a slight, but
statistically significant, lead over Sen. John Kerry, according to the
latest CNN/USA Today/Gallup survey. Among likely voters, Bush receives
51% support, while Kerry receives 46%. Among the larger group of
registered voters, Bush's lead of two points (49% to 47%) is well within
the poll's margin of error.
*******
My Matlab m.file, pollsig.m, called with
[ME,D,Dprob,halfCI]=pollsig(1195,[0.51 0.46 0.03],4e4,1)
function [ME,D,Dprob,halfCI]=pollsig(N,V,Trials,details)
% How significant are differences in poll results?
% format [ME,D,Dprob,halfCI]=pollsig(N,V,Trials,details);
% Input:
% Required:
% N = Number of individuals polled
% V= Column or row vector with proportion of responses, need not sum
to 1, but sum(V)<1;
% In this implementation only the two most common items will be
tested.
% Optional:
% MCTrials=Number of Monte Carlo trials used to judge significance
% if Trials not specified, 1e4 trials will be used
% if Trials=0, then the covariance formula will be used to judge
significance
% details=0, supress all output within the m.file.
% Output:
% ME=Margin of Error for the poll
% D=difference in proportions, length(V) x length(V) symmetric matrix.
% Dprob=two-sided p value for test of equality of proportions of the 2
most common items
% Dprob will have a minimum value of 1/Trials
% halfCI=half the 95% CI for difference in proportions of the two most
common items.
% Written for ECOS601 by E. Gallagher (UMASS/Boston)
% based on KerryDeanpoll.m, see that m.file for details.
if nargin<4
details=1;
MC=1;
if nargin<3
Trials=1e4;
elseif Trials==0
MC=0;
end
end
% Calculate Margin of Error, p. 330, Larsen & Marx:
ME=erfinv(0.95)*sqrt(2)/(2*sqrt(N));
% Can be called with ME=zprob(0.05)/(2*sqrt(N));
if details
fprintf('The margin of error for a poll of size %d is
%3.1f%%.\n',N,ME*100);
end
% Monte Carlo simulation
if details;
fprintf('\nMonte Carlo simulation based on %d trials:\n',Trials);
end
V=V(:); % Change V to a column vector
V=flipud(sort(V));V=V(1:2); % This m.file will only calculate
significance for top two categories.
tallys=zeros(Trials,2); % A column vector with rows=Trials & 2 columns;
tallys differences
tallyHo=zeros(Trials,2); % This will store the results for testing Ho:
p1=p2;
ExpP=mean(V);
for i=1:Trials
poll=rand(N,1); % Creates a vector with uniformly distributed
random numbers on the interval 0,
tallys(i,1)=sum(poll<=V(1));
tallys(i,2)=sum( (poll>V(1)) & (poll <= (V(1)+ V(2))) );
tallyHo(i,1)=sum(poll<=ExpP);
tallyHo(i,2)=sum( (poll>ExpP) & (poll <= 2*ExpP));
end
%
V=flipud(sort(V(:)));cv=[0;cumsum(V)];r=rand(1,N);s=sum(repmat(r,length(cv),1)>=repmat(cv,1,length(r)))
% b=full(sparse(s,1,ones(1,N)));tallys(i,:)=b(find(b))'
DifferenceHo = (tallyHo(:,1) - tallyHo(:,2))/N; % Calculate the
differences for all Trials under Ho: p1=p2
D=abs(V(1)-V(2));
Dprob=max([1 sum(abs(DifferenceHo)>=D)])/Trials;
if details & Dprob<0.001 % change the format so that it is in full form
only for low p values:
fprintf(...
'Under the null hypothesis of equal proportions and %d trials,',Trials)
fprintf('\nthe 2-sided probability of observing a %5.3f%%
difference by chance is %d\n',...
D*100,Dprob);
elseif details
fprintf(...
'Under the null hypothesis of equal proportions, and %d
trials,',Trials)
fprintf('\nthe 2-sided probability of observing a %4.2f%%
difference by chance is %5.3f\n',...
D*100,Dprob);
end
Diff = (tallys(:,1) - tallys(:,2))/N;
% 95% CI via Monte Carlo simulation
sortedDiff=sort(Diff);
lMC95CIpi=floor(0.025*Trials); % find the index for the lower 95% cutoff
value
uMC95CIpi=ceil(0.975*Trials); % find the index for the upper 95% cutoff
value.
medpi=round(0.5*Trials); % find the median, should be close to or
identical to the expected value.
% Save the three outputs in the row vector DLowExpUp
DLowExpUp=[sortedDiff(lMC95CIpi) sortedDiff(medpi) sortedDiff(uMC95CIpi)];
halfCI=(DLowExpUp(3)-DLowExpUp(1))/2;
if details
fprintf('\nLower 95%% confidence limit, median, and upper 95%%
confidence limit based on %d trials:\n',Trials)
fprintf('Lower 95%% CI \tMedian \tUpper 95%% CI \n \t%4.2f%%
\t\t%4.2f%% \t%4.2f%% \n',...
DLowExpUp(1)*100,DLowExpUp(2)*100,DLowExpUp(3)*100)
fprintf('\nDifference +/- half 95%% CI: %4.1f%% +/-
%4.1f%%\n',D*100,halfCI*100)
end
- Next message: demography: "Find Life Table (actuarial method) estimate with R or S-Plus"
- Previous message: Rod: "Re: asking suggestions on the algorithm in compution.Thanks."
- In reply to: Bruce Weaver: "Re: interpreting poll margin of error?"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|
|