Re: IBM tops supercomputing table

From: Uncle Al (UncleAl0_at_hate.spam.net)
Date: 11/10/04


Date: Wed, 10 Nov 2004 08:27:36 -0800

Sam Wormley wrote:
>
> IBM tops supercomputing table (Nov 9)
> http://physicsweb.org/article/news/8/11/4
> A supercomputer being built by IBM for the US Department of Energy (DOE)
> has topped a list of the world's fastest computers. The Blue Gene/L
> beta-system can perform 70,720 billion calculations per second, which
> has earned it first place in the latest TOP500 list that was released at
> the SC2004 conference in Pittsburgh earlier today. In second place is
> the Columbia supercomputer at NASA's Ames Research Center, which is
> capable of 51,870 billion calculations per second, followed by the Earth
> Simulator in Japan.

What is more interesting is their guts. Original Blue Gene is a
custom job and it cost a fortune. AMD clustered a pile of Opteron
blades to hit a teraflop at modest cost. Virginia Tech clustered
Apple G5 and now G6 modules to really hit the gong at modest expense.

NASA Ames' "Columbia" is a political exercise. It is built of more
than 10,000 Itaniums that Intel couldn't otherwise give away. Cooling
the beast must be a nightmare. It is not clear that "Columbia" can
effectively run any memory bandwidth-intensive application even with
Infiniband. It's a big pile of isolated CPUs suited to mesh
simulations. It can run multiple serial processes in parallel but
probably not true parallel processes. Intel screws the pooch for mobo
data flow.

http://www.nature.com/news/2004/041108/full/041108-3.html
http://amesnews.arc.nasa.gov/releases/2004/04_103AR.html
http://www.nas.nasa.gov/News/Releases/2004/11_08_04_fastest.html

Cray Computer made its mark with expensive custom GaAs chips densely
configured and intensely cooled. It was eclipsed and then left in the
dust by cheap clusters of off the shelf silicon CPUs. Beowulf.
Supercomputer applications generally require hardware configured to
their tasks. Even so, Opteron-8xx and G6 chips will *substantially*
outperform the competition - especially if power consumption over time
is factored in.

In our CHI computation, an obsolete Opteron-244 at 1.8 GHz has 40%
more throughput than a blazing hot 3.4 GHz "Nocona" Xeon, both running
the same Linux OS. The only apps in which Intel CPUs are clearly
superior are multimedia conversions, e.g., raw video to mpg or DvD
compressions. The standard software is specifically written to and
optimized for Intel. Open source Linux clones written without bias
are much faster in AMD CPUs. The CPU frequency is not relevant.
Clustered Opteron-848s in our app lost 1%/CPU of isolated CPU
performance for each CPU in the 16-cluster in parallel computation.
1% cluster overhead is incredible. Four Pentium4s executing in
parallel in Windows mobo might equal 2.5 single Pentiums.

The largest clustered hardware on the planet, Google, makes do with
swap meet hardware. It simply has an awful lot of it. Finally, given
the same hardware and source code for running CHI under Windows or
Linux, compiled and run for Linux is 30% faster. Given that the
datastorm never makes it off the CPU, that tells you a whole bunch
about the OS.

-- 
Uncle Al
http://www.mazepath.com/uncleal/
 (Toxic URL! Unsafe for children and most mammals)
http://www.mazepath.com/uncleal/qz.pdf


Relevant Pages

  • Re: 32-bit vs. 64-bit x86 Speed
    ... these chips run legacy 32 bit code faster than 32 bit chips do, ... personal computers then, were, in hardware: ... More powerful graphics cards. ... chip with 32 CPUs which can be shut down to conserve energy, ...
    (comp.compilers)
  • Re: Cluster computing drawbacks
    ... BTW, SGI's numbers are all optimistic and depend on the fact that only one CPU is moving a line, not all the cpus in the system. ... But transparent access to data among multiple processes is the raison d'etre of SMPs, and as long as the system will fully support 5% of the processes doing this, SMPs have significant programmatic advantages over clusters. ... I'd argue that cache coherence really isn't necessary for most folks skilled in programming shared memory for HPC. ...
    (comp.arch)
  • Re: why mainframes are still used?
    ... We don't know how many CPUs the 3090 was using. ... which claimed this 25 MFLOPS machine was a supercomputer. ... was usually Cray. ... warrant the cost. ...
    (comp.os.vms)
  • Re: why mainframes are still used?
    ... "Supercomputer" is a term used very loosely by some people. ... and when a 1 nanosecond decrease in the Cray ... parallel array of fairly fast CPUs. ... I think we figured out once that for a 96% parallel problem ...
    (comp.os.vms)
  • Re: IBM tops supercomputing table
    ... >> A supercomputer being built by IBM for the US Department of Energy ... > dust by cheap clusters of off the shelf silicon CPUs. ... > Supercomputer applications generally require hardware configured to ... Open source Linux clones written without bias ...
    (sci.physics)