Re: Who uses clapack?
From: Jentje Goslinga (goslinga_at_telus.net)
Date: 12/11/04
- Next message: glen herrmannsfeldt: "Re: Who uses clapack?"
- Previous message: beliavsky_at_aol.com: "Re: Who uses clapack?"
- In reply to: Ron Shepard: "Re: Who uses clapack?"
- Next in thread: glen herrmannsfeldt: "Re: Who uses clapack?"
- Messages sorted by: [ date ] [ thread ]
Date: Sat, 11 Dec 2004 22:13:11 GMT
Ron Shepard wrote:
> In article <41BA43C3.3040502@telus.net>,
> Jentje Goslinga <goslinga@telus.net> wrote:
>
>
> First, the dnrm2() function is not required to store the internal
> result of the accumulation in 64-bit form. It is all internal to
> the dnrm2() function with no external storage required for the
> intermediate square root. Your dnrm2() function may or may not do
> the sqrt() in extended precision, but it is not forced to truncate
> by the design of the BLAS.
>
> Even the reference fortran version of the dnrm2() function does not
> do a simple dot product to compute the vector norm, it accumulates
> into several scalar values based on the magnitude of the individual
> elements. The goal of this approach is to reduce the effect of
> roundoff error of the additions.
>
>
>>and another one between (3) and (4)
>
>
> Second, even if truncation to 64-bits occurs for the scale factor,
> the reciprocal is still accurate to 64-bit floating point precision.
> If the reciprocal is accurate to 64-bit precision, then the final
> scale operation will produce a full vector of values that are
> correct to full 64-bit precision.
>
>
>>assuming standard C program stack based parameter passing.
>>The rounding may not make much difference in this case but
>>it is not the same.
>
>
> Can you give a simple example to show the difference?
<snip>
> You can convince me by showing a simple example to prove your point.
Probably not, for the sake of completeness I ran an example
1000 times where I test typical ASM versions of dnrm2/dscal
versus a combination subroutine:
do {
register long double norm;
uvrand(n,u); /* [-1/2,1/2] */
uvcp(n,v,u); /* copy: v = u */
/* Using vector unify */
uvpack->uvunify(n,u);
/* Using usual BLAS */
norm = uvpack->uvnorm(n,v);
norm = 1/norm;
uvpack->uvscl(n,v,norm);
error[0] = ULP(uvnorm(n,u)-1);
error[1] = ULP(uvnorm(n,v)-1);
errsum[0] += abs(error[0]);
errsum[1] += abs(error[1]);
...
} while (++niter != 1000);
In 1000 random vectors [-1/2,1/2] there is precisely 224
times an error of precisely one ULP with the BLAS method,
but never any error with a single call.
The error is indeed minimal in this case but it can be
zero too.
> $.02 -Ron Shepard
There are some other instances where the BLAS may be taylored
somewhat better to the problem.
$0.01
Jentje
- Next message: glen herrmannsfeldt: "Re: Who uses clapack?"
- Previous message: beliavsky_at_aol.com: "Re: Who uses clapack?"
- In reply to: Ron Shepard: "Re: Who uses clapack?"
- Next in thread: glen herrmannsfeldt: "Re: Who uses clapack?"
- Messages sorted by: [ date ] [ thread ]