Re: P-value from chi-square value: source code

"Reef Fish" <Large_Nassau_Grouper@xxxxxxxxx> wrote in message news:<1113254955.951865.94110@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>...
> Ian Smith wrote:
> > "Reef Fish" <Large_Nassau_Grouper@xxxxxxxxx> wrote in message
> news:<1113074906.743917.66500@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>...
> > > DZ wrote:
> > > > P.S. The p-value is 1-chi_cdf(ChiSquare, df)
> > >
> > > If so, why do you need any ChiSquare software, let along coding it?
> > > Why wouldn't any standard normal software suffice?
> > > >
> > > > DZ wrote:
> > > > > Dhruv <bhaveshrb@xxxxxxxxx> wrote:
> > > > >> Ian Smith wrote:
> > > > >>> "Dhruv" <bhaveshrb@xxxxxxxxx> wrote
> > > > >>> > Could anyone send me a link to the source code in C++ for
> > > > >>> > calculating the P-value from a chi-square value.
> > > > >>> You are looking for code to evaluate the upper tail
> probability
> > > of a
> > > > >>> chi-squared distribution.
> > > > >>>
> > > > >>
> > > > >> I actually tried the code and am having some problems with it.
> I
> > > am
> > > > >> sort of a newbie to programming and though I have managed to
> write
> > > > >> some good code, I am unable to figure out the problem at
> present.
> > > > >> What I need basically is a function which takes in Chi-square
> > > value
> > > > >> and returns the P-value. Please help me on this.
> > > > >
> > > > > You need to compile the library and use the function "cdfchi" ,
> > > > > passing parameters as addresses, for example:
> > > > >
> > > > > double chi_cdf (double ChiSquare, double df) {
> > > > > double q, p, bound;
> > > > > int which=1, status;
> > > > > cdfchi(&which, &p, &q, &ChiSquare, &df, &status, &bound);
> > > > > return p;
> > > > > }
> > >
> > > And Ian Smith gave two links and said,
> > >
> > > IS> both have C code which will do it and there is nothing better
> out
> > > IS> there that I know of.
> > >
> > > Nothing better? Use either or both those links and let us know
> > > that the p-values (1-tail will do) for observed 1 df Chisquare
> > > values of 1, 10, and 100, to at least 8 significant figures,
> > > and then I'll tell you how many better ones there are. :-)
> > >
> > > -- Bob.
> >
> My point was merey my counterpoint to your "there is noting better out
> there"
> even thought you did have the qualifying disclaimer "that I know of".

Obviously I didn't qualify it enough for your liking. How about "I do
not know of any C or C++ source code using double precision arithmetic
which is better". From my point of view it is true and hopefully from
yours it is no longer provocative!

> > For Gnumeric, the figures (rounded to 15 figures) you requested were
> > 3.17310507862914E-01, 1.56540225800255E-03 & 1.52397060483210E-23
> > I believe the 3rd one should be 1.52397060483211E-23
> For this pedantic numeric exercise, the p-value of 100 (1 d.f.) is
> 1.523970604832105213139466865031986167270080665559139211560717 E-23
> because I asked for only 60 significant figures of the 100 max. It
> took my 14-year old laptop less than 1 sec to get this (and the
> software
> ET (Electronic Tables) is one of the reason I kept the laptop. :-)
> 1.52397060483210E-23
> 1.52397060483211E-23
> You're correct that the 11 should be the rounded value.
> As for your software,
> (my own software, it
> gave
> a result incorrect in the last two significant figures:
> 1.5239706048321013 E-23
> 1.523970604832105213139466865031986167270080665559139211560717 E-23
> which suggests that those were your double-precision noise.

That's right, it's just the way Javascript displays numbers by
default. Also it does only work with 64-bit arithmetic. You get no
advantage from the fact that greater precision may be available with
floating-point processors. It does mean you do not get an exaggerated
sense of how good the accuracy is.

I don't think Javascript is meant to be a good language to calculate
in. In fact, I would have thought it was a bad one because the code is
slow and development in an environment with minimal support is
tedious. It was chosen so that it can be run without needing any
particular compiler but can easily be translated into the language of
the user's choice should anyone decide the code is worth using.

The relative error is 2.6e-15. Given the code essentially works with
logs and exponentiates to get the answer, the minimum possible error
bound (i.e. assuming that the call of Math.exp is the only source of
error) which could be claimed is 6e-15. By the way, not that I wish to
compare algorithms, but really you should be looking to see how many
figures have been lost not how many are remaining. An algorithm with 1
figure remaining from 7 is a lot better than an algorithm with 2
figures remaining from 16. I'm confident your 60 figures are correct
because I would expect ET to get 97 or even 98 figures correct for
that calculation with 100 figure arithmetic. There's no big deal in
doing 100 figure arithmetic and the task is just the same ? minimise
the accuracy being lost by using good algorithms.

> > However, I do not think an algorithm for a chi-squared distribution
> > for 1 degree of freedom (i.e. a simple transformation of the normal)
> > is of much interest.
> That was one point I made on DZ's post, before challenging you on the
> "nothing better out there" conjecture.
> For double-precision arithmetic, those software you named are probably
> as good any of dozens of others using double-precision with EXACT
> formulas for calculating the tail probabilities. Many of them are
> interpretive languages, such as S, S+, and so on, so that there is
> no NEED to do any CODING, expecially using primitive codes like C,
> C+, C++ , etc.

I don't think interpretive languages such as S and S+ would be of much
use to Dhruv, as I think he's looking to produce some C++ application,
but it gives me the opportunity to mention that R (GNU S) has recently
changed its code for the cdf of the beta distribution to base it on
TOMS708 (as per DCDFLIB) and its code for the cdf of the gamma
distribution to use that of Gnumeric.

> > In my opinion, most statistical software available is pretty poor.
> Here you have to be careful about what do you mean by "most" and
> "pretty
> poor". There is no need to knock any software or tout your own -- all
> you have to do is to say what PRECISION it'll guarantee to deliver and
> let others decide whether it's good, good enough, or poor.

I should not have made that remark to help justify the choices I
recommended and I take it back. To begin with I would guess most
statistical software appears in analysis packages and I know little
about them.

Part of the problem stems from sources such as A&S, NR and the like.
Although they list properties of functions, they tend not to worry
about the practical aspects of their use. The following is an example
from NR ignoring the practical aspects of convergence of algorithms
for the gamma distribution.

"Thus (6.2.5) and (6.2.7) together allow evaluation of the function
for all positive a and x." The paragraph is followed by code to
evaluate the "incomplete gamma function P(a, x)" and the "incomplete
gamma function Q(a, x) = 1 - P(a, x)".

If you use fixed-precision arithmetic, the code which follows does not
produce accurate answers for all positive a and x, no matter what
fixed-precision you use! For any given a and x, fixed-precision
arithmetic can deliver a suitably accurate answer but the required
precision for the arithmetic will depend on the values of a and x.

I'm not going to apportion blame but there are programmers out there
who seem to take them at their word and consequently create programs
which don't behave as they expect. The old code in R2.0.1 is an
example of just that, a problem soon to be of the past as R2.1.0 is
due out soon.

Anyone looking for source code for the gamma distribution should watch
out for this problem. Sometimes as with the R2.0.1 code, a normal
approximation is used for large values. As long as this approximation
is accurate enough and you don't want to use it for small shape
parameters (as would be the case for Dhruv who just wants code for the
Chi-squared distribution) then the code will be fine.

> > It would be nice when people recommend software if they included some
> > details of why. Admittedly, my explanation was perhaps too brief, but
> > at least I do know the software. I have looked at several software
> > packages in a variety of languages and I stick by what I said before.
> > There is nothing better out there that I know of.
> See my preceding comments. In my opinion, the software ET beats your
> software (in precision, flexibility, and versatility) as a modern jet
> beats a turtle in a 100 yard dash. :-)
> For Special functions, it has beta, incomplete beta, and incomplete
> gamma.
> It has all the trigonometric functions
> For continuous densities, it has the usual, in addition to
> Weibull, noncentral chi-square, double-noncentral t, double noncentral
> F,
> and others.
> Last but not least, it's' NOT MY software, and I have absolutely
> nothing
> to do with it. :-) I was kindly given the copy by the author of the
> software when I was studying the accuracy of various exact and
> approximate
> numbercal algorithms for probability functions.
> I don't know if its still commercially available.

ET may well have all the merits you describe but it does have some
obvious drawbacks. Even if the source is C or C++ we have no
reference. Without the source code I can't think of a convenient way
in which Dhruv could call it from his C++ program.

There is a place for programs such as ET, mainly for generating test
cases for assessing the performance of software such as DCDFLIB and
Gnumeric. No matter how enthusiastic you are about ET, I find it hard
to believe you would seriously recommend the inclusion of a
high-precision implementation of algorithms for the chi-squared

> -- Bob
> >
> > If you wish to get a better idea of the accuracy of these methods try
> > the Chi-squared calculator at
> > (my own software is
> > not in C or I would have recommended it) or download Gnumeric and
> give
> > it a go!
> >
> > Ian Smith

Anyway a last word to Dhruv, I trust you have sorted your problems out
since we haven't heard any more about them!

Ian Smith