Re: interpreting poll margin of error?

From: Eugene Gallagher (Nospam54_at_aol.com)
Date: 10/29/04


Date: Fri, 29 Oct 2004 11:52:38 -0400

Bruce Weaver wrote:

...
> Rich Ulrich has given you a good response. Just thought I'd also
> mention that Eugene Gallagher posted an item called "On a recent poll"
> in sci.stat.edu not too long ago that may be of interest to you. A
> Google groups search on that term should lead you to it.
>
> You might find the American Statistical Association's Survey Research
> Methods Section of interest too. See the "What is a survey" link on
> this page, for example:
>
> http://www.amstat.org/sections/SRMS/index.html
>
>
I often find the significance of poll results misreported. For example,
Gallup this week described their latest poll of likely voters indicating
a ‘statistically significant’ 5-point lead for Bush.
    Gallup’s 10/26/04 Press release has the race at 51-46% Bush with
their sample size of 1195. The margin of error for a poll of size 1195
for a single proportion is 2.8%. But the margin of error for the 5%
difference is 5 ± 5.6%. The probability of observing this 5% difference
by chance is 8.1%, certainly higher than the 5% that Gallup uses to
assess statistical significance in the rest of their polling.
   I use a Matlab m.file that I wrote called pollsig.m to calculate the
significance of differences in proportions for polls. The margin of
error above was calculated with about 1 million Monte Carlo trials (but
50,000 would have been more than sufficient). I’ve included it below. As
reported previously in this thread, for pairs of proportions close to
50%, the margin of error for the difference is just slightly less than
twice the individual margin of error.

The misleading Gallup press release:

http://www.gallup.com/poll/content/?ci=13792
Bush Retains Lead Among Likely Voters, Now 51% to 46%
Contest essentially tied among registered voters
by David W. Moore
GALLUP NEWS SERVICE
PRINCETON, NJ -- President George W. Bush holds a slight, but
statistically significant, lead over Sen. John Kerry, according to the
latest CNN/USA Today/Gallup survey. Among likely voters, Bush receives
51% support, while Kerry receives 46%. Among the larger group of
registered voters, Bush's lead of two points (49% to 47%) is well within
the poll's margin of error.

*******
My Matlab m.file, pollsig.m, called with
[ME,D,Dprob,halfCI]=pollsig(1195,[0.51 0.46 0.03],4e4,1)

function [ME,D,Dprob,halfCI]=pollsig(N,V,Trials,details)
% How significant are differences in poll results?
% format [ME,D,Dprob,halfCI]=pollsig(N,V,Trials,details);
% Input:
% Required:
% N = Number of individuals polled
% V= Column or row vector with proportion of responses, need not sum
to 1, but sum(V)<1;
% In this implementation only the two most common items will be
tested.
% Optional:
% MCTrials=Number of Monte Carlo trials used to judge significance
% if Trials not specified, 1e4 trials will be used
% if Trials=0, then the covariance formula will be used to judge
significance
% details=0, supress all output within the m.file.
% Output:
% ME=Margin of Error for the poll
% D=difference in proportions, length(V) x length(V) symmetric matrix.
% Dprob=two-sided p value for test of equality of proportions of the 2
most common items
% Dprob will have a minimum value of 1/Trials
% halfCI=half the 95% CI for difference in proportions of the two most
common items.

% Written for ECOS601 by E. Gallagher (UMASS/Boston)
% based on KerryDeanpoll.m, see that m.file for details.

if nargin<4
     details=1;
     MC=1;
     if nargin<3
         Trials=1e4;
     elseif Trials==0
         MC=0;
     end
end
% Calculate Margin of Error, p. 330, Larsen & Marx:
ME=erfinv(0.95)*sqrt(2)/(2*sqrt(N));
% Can be called with ME=zprob(0.05)/(2*sqrt(N));
if details
     fprintf('The margin of error for a poll of size %d is
%3.1f%%.\n',N,ME*100);
end

% Monte Carlo simulation
if details;
     fprintf('\nMonte Carlo simulation based on %d trials:\n',Trials);
end
V=V(:); % Change V to a column vector
V=flipud(sort(V));V=V(1:2); % This m.file will only calculate
significance for top two categories.
tallys=zeros(Trials,2); % A column vector with rows=Trials & 2 columns;
tallys differences
tallyHo=zeros(Trials,2); % This will store the results for testing Ho:
p1=p2;
ExpP=mean(V);
for i=1:Trials
     poll=rand(N,1); % Creates a vector with uniformly distributed
random numbers on the interval 0,
     tallys(i,1)=sum(poll<=V(1));
     tallys(i,2)=sum( (poll>V(1)) & (poll <= (V(1)+ V(2))) );
     tallyHo(i,1)=sum(poll<=ExpP);
     tallyHo(i,2)=sum( (poll>ExpP) & (poll <= 2*ExpP));
end

%
V=flipud(sort(V(:)));cv=[0;cumsum(V)];r=rand(1,N);s=sum(repmat(r,length(cv),1)>=repmat(cv,1,length(r)))
% b=full(sparse(s,1,ones(1,N)));tallys(i,:)=b(find(b))'
DifferenceHo = (tallyHo(:,1) - tallyHo(:,2))/N; % Calculate the
differences for all Trials under Ho: p1=p2
D=abs(V(1)-V(2));
Dprob=max([1 sum(abs(DifferenceHo)>=D)])/Trials;
if details & Dprob<0.001 % change the format so that it is in full form
only for low p values:
     fprintf(...
     'Under the null hypothesis of equal proportions and %d trials,',Trials)
     fprintf('\nthe 2-sided probability of observing a %5.3f%%
difference by chance is %d\n',...
         D*100,Dprob);
elseif details
     fprintf(...
     'Under the null hypothesis of equal proportions, and %d
trials,',Trials)
     fprintf('\nthe 2-sided probability of observing a %4.2f%%
difference by chance is %5.3f\n',...
         D*100,Dprob);
end
Diff = (tallys(:,1) - tallys(:,2))/N;
% 95% CI via Monte Carlo simulation
sortedDiff=sort(Diff);
lMC95CIpi=floor(0.025*Trials); % find the index for the lower 95% cutoff
value
uMC95CIpi=ceil(0.975*Trials); % find the index for the upper 95% cutoff
value.
medpi=round(0.5*Trials); % find the median, should be close to or
identical to the expected value.
% Save the three outputs in the row vector DLowExpUp
DLowExpUp=[sortedDiff(lMC95CIpi) sortedDiff(medpi) sortedDiff(uMC95CIpi)];
halfCI=(DLowExpUp(3)-DLowExpUp(1))/2;
if details
     fprintf('\nLower 95%% confidence limit, median, and upper 95%%
confidence limit based on %d trials:\n',Trials)
     fprintf('Lower 95%% CI \tMedian \tUpper 95%% CI \n \t%4.2f%%
\t\t%4.2f%% \t%4.2f%% \n',...
        DLowExpUp(1)*100,DLowExpUp(2)*100,DLowExpUp(3)*100)
    fprintf('\nDifference +/- half 95%% CI: %4.1f%% +/-
%4.1f%%\n',D*100,halfCI*100)
end



Relevant Pages

  • On a recent poll
    ... finding many poll results being misreported. ... the 4% difference being outside the 3.5% margin of error. ... and not the margin of error for a single proportion. ... The poll result is Bush 49% and Kerry 45% which you ...
    (sci.stat.edu)
  • Re: What if 20 million illegals left?
    ... If the poll is conducted properly, and the results are analyzed correctly, ... you calculate the margin of error. ... If you're interested in how the math works, I can explain it, but for now ... half of the people in the US want stricter gun control. ...
    (rec.scuba)
  • Re: WISDOM of Crowds (Iden?ify -Uare the missing data)
    ... > margin of error? ... reported percentages would vary if the same poll were taken multiple ... like the math behind the standard deviation. ... And reporting anything different is misleading. ...
    (sci.logic)
  • Re: interpreting poll margin of error?
    ... > Donald Duck 45% ... The "3%" implies that 50-50 is at the margin of the limit. ... for which the poll is conducted, ... How will those voters break, if they end up voting A or B? ...
    (sci.stat.edu)