Re: OT Dual core CPUs versus faster single core CPUs?
- From: Martin Brown <|||newspam|||@nezumi.demon.co.uk>
- Date: Wed, 07 May 2008 11:51:55 +0100
John Larkin wrote:
On Tue, 06 May 2008 19:19:29 -0700, JosephKK <quiettechblue@xxxxxxxxx>
wrote:
On Mon, 05 May 2008 10:16:49 -0700, John Larkin
<jjlarkin@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On Sat, 03 May 2008 12:39:17 -0700, Jeff Liebermann <jeffl@xxxxxxxxxx>
wrote:
On Sat, 03 May 2008 06:50:49 -0700, John Larkin
<jjlarkin@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
What else are you going to do with 1024 CPU's on a chip?Well, how about...
John
Error detection. Have three CPU's do essentially the same
calculations. If they agree, continue. If they disagree, take the
result from the two that agree. The overhead is minimal as the
processes are all concurrent. However, the power dissipation might be
up to 3 times higher than with a single CPU.
That only works if the three versions of the software were written independently to satisfy the same specifications and preferably using different tools. It is done only in absolutely mission critical life or death software. I think parts of the space shuttle launch sequence uses this approach (and sometimes the launch is cancelled because the systems disagree at a checkpoint).
Otherwise all you are ensuring is that the same software run three times gives the same answer (which might or might not be true depending on the FP rounding rules). And you have to be very careful that the additional complexity does not itself add a new mode of failure and unreliability. A failure in the supervisor that compares the answers for instance.
A much cheaper way to improve software reliability is to port it to another machine or even a different compiler. We just about always found something of interest every time this was done even for code that was extremely robust and had been run on everything from a Cray down to a Z80 (the latter was done to win a bet). Static testing of software is possible but comparatively few shops do it seriously.
The CPU's don't have bugs; the software does.
John
CPUs typically have a few bugs each but they are seldom of major consequence. The last one I can recall that was serious egg on face was the Intel F00F bug. So before you gloat to much about hardwares seeming infalliblity I suggest you read the abstract at:
http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/4211748/4211749/04211889.pdf?tp=&isnumber=&arnumber=4211889
CPUs are much more intensively simulated and individual key component blocks like multipliers and dividers are much more amenable to formal methods proof of correctness and reuse than generic freeform business software. Even so there is a test vs performance trade off for release.
ISTR when Cyrix commissioned their formal specification of the x87 to produce a cleanroom clone that was pin for pin compatible they found a couple of dozen minor defects in the original Intel x87 chip.
I am truly amazed in the current generation of P4 chips that the register colouring and speculative execution doesn't cause more problems.
So you have forgotten about the Pentium FDIV bug already?
In a sense that was as much an algorithmic/firmware error as a hardware bug (and it was exceedingly rare that it triggered). I had one machine with the fault - provided to me by a customer to ensure that our fixed software worked OK even on the defective CPUs.
For comparison the XL2007 pre SP1 cannot annotate log graphs correctly above 10^8 (two ticks labelled 10000000) and infamously displays 65535-eps as 100000 for certain unlucky values of eps<<<1. And it is so slow at drawing large graphs without SP1 as to be a joke.
But CPU bugs are rare and documented, and there are workarounds. In a
typical PC, in the OS and a reasonable set of apps, there will be
thousands of bugs, maybe tens of thousands, most of which are
undocumented and never fixed. Next rev will keep many of the old bugs
and add thousands more. There's just no comparison between crashes
caused by software vs hardware: I bet the ratio is ballpark 1e5:1.
I'd be more inclined to bet 1000:1 maybe 10000:1 at the outside. I have seen the odd bug and/or undocumented feature in most CPUs I have worked on - most of them unimportant, a couple show stopping, and some of them are even useful. One or two were a major security risk.
Existing hardware design methodologies work very well;
billion-transistor chips are reliable. Current software methodologies
are clearly broken: a million line program typically has thousands of
bugs.
It is worth pointing out here that all modern chips are designed using software. The big difference is that committing to large scale bulk chip fabrication is *so* horrendously expensive that it doesn't happen until the thing simulates perfectly and tests out OK in prototype hardware against aggressive whitebox testers.
Software by comparison is dirt cheap to duplicate and first to market advantage is huge. The result is regretably a "ship it and be damned" management culture. You can always issue chargeable hotfixes or service packs.
BTW A million line manually written program with average industry practice should typically have around 500 bugs in it. Best practice is one or two orders of magnitude better if you are prepared to pay the price (and wait longer).
Regards,
Martin Brown
** Posted from http://www.teranews.com **
.
- Follow-Ups:
- References:
- OT Dual core CPUs versus faster single core CPUs?
- From: John Doe
- Re: OT Dual core CPUs versus faster single core CPUs?
- From: John Larkin
- Re: OT Dual core CPUs versus faster single core CPUs?
- From: rickman
- Re: OT Dual core CPUs versus faster single core CPUs?
- From: John Larkin
- Re: OT Dual core CPUs versus faster single core CPUs?
- From: John Larkin
- Re: OT Dual core CPUs versus faster single core CPUs?
- From: Jeff Liebermann
- Re: OT Dual core CPUs versus faster single core CPUs?
- From: John Larkin
- Re: OT Dual core CPUs versus faster single core CPUs?
- From: JosephKK
- Re: OT Dual core CPUs versus faster single core CPUs?
- From: John Larkin
- OT Dual core CPUs versus faster single core CPUs?
- Prev by Date: Re: AC-AC Voltage Regulator
- Next by Date: Re: Strange SDRAM problem - Influenced bit corruption
- Previous by thread: Re: OT Dual core CPUs versus faster single core CPUs?
- Next by thread: Re: OT Dual core CPUs versus faster single core CPUs?
- Index(es):
Relevant Pages
|