Re: a dozen cpu's on a chip



John Larkin wrote:
On Thu, 15 May 2008 10:24:05 -0700 (PDT), panteltje@xxxxxxxxx wrote:

On 15 mei, 15:50, John Larkin
<jjlar...@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On Thu, 15 May 2008 09:14:43 +0100, John Devereux
Nanometer transistors are fast and free.

Actually they are not, those 80-cores will be difficult to make
(yield),

If you want 250 cores, build 300 and use the 250 that work. So a chip
can have 50 defects and you can still sell it. Or build one giant CPU
on the same silicon and toss it if has a single defect.

Unless and until there is software to efficiently exploit large processor clusters for general purpose use it doesn't matter.

gigabit world. The majority of guys here are adamant that
things will never change, a pretty radical position for engineers to
take.
mm you keep sticking that in every bodies mouth, but when I asked how
you would spread a monolithic resources sucking application over 'n'
CPUs
you remained silent.

I already suggested that a few of the cpu's could be floating-point
monster number crunchers, and most could be dumber, slower integer
machines. A TCP/IP stack doesn't need much floating point power.

Neither does the core kernel for an operating system. Your model serves only to waste silicon real estate and electrical power to no good end.

And that is one issue.
The other one you conveniently forget is that, if each core has its
own memory,
where is the overhead in moving data... sync. etc.

They'd surround a shared cache. They wouldn't bother the common cache
when they execute out of local cache, or when the use the small local
stack and variables rams. That makes the shared cache much more
efficient, since it not being invalidated by a lot of unnecessary
traffic.

Want to bet?

The fastest way to bring your "uncrashable" independent CPUs with shared common memory model to its knees would be to set a few small tasks running flat out in several cores allocating and deallocating memory at random and hiting it with read/writes at worst case strides for the cache. The OS would still run but its performance would be dire.


One thing I've always thought that CPUs should have is hardware task
switching, a register that declares which task or thread the core is
running. That would instantly remap everything... the registers, the
memory mapping, everything. That would make context switching have
zero overhead, and allow full hardware protection. But nowadays, one

There are CPUs coming along (in production?) with hardware support for threads and context switching. And that does make good sense. Some extra hardware support for memory allocation and garbage collection might also be handy but is not mainstream.

might just as well have multiple cores. That would be faster, and
avoids some cache efficiency and pipeline issues.

But create all sorts of other I/O bandwidth bottlenecks that you conveniently gloss over in your hazy rose tinted view.

It is interesting that 100% of the responses to my posts have been
destructive, and none additive. I sure hope you guys don't actually
work that way.

That is because your idea would not work as you intend and you are completely deaf to any criticism.

The research work at Intel is on speculative multi-threading and other methods to allow multicore hardware to deliver real world performance increases in the future - a short review online at:

http://www.intel.com/technology/magazine/research/speculative-threading-1205.htm

And this is a very long way from your naive CPU per thread world view.

Regards,
Martin Brown
** Posted from http://www.teranews.com **
.



Relevant Pages

  • Re: Automatic parallelization - was Re: LISP Object Oriented?
    ... the cache coherence system. ... that package to main memory and the design of that memory system, ... ramifications far beyond the design of the package alone. ... Now, the situation changes as you add CPUs, but the hit each CPU takes ...
    (comp.lang.lisp)
  • Re: [patch 0/6] mm: alloc_percpu and bigrefs
    ... David S. Miller a écrit: ... with no per-cpu or per-node additional memory but got no comment. ... It's important to place mostly read parts together, so that a cache lines can ... bus trafic between CPUS. ...
    (Linux-Kernel)
  • RE: Scaling noise
    ... numbers of CPUs. ... It *seems* that the high-end degradation of cache performance as N increases ... and not the code to accommodate them is the cause of the invalidations. ... runtime is the unused memory for the 6 or 62 per-cpu-data structures. ...
    (Linux-Kernel)
  • Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
    ... CPU can hold that data in cache as long as it wants before it writes ... it to memory. ... not-yet-written volatile value. ... Communicating both with interrupt handler and with other CPUs. ...
    (Linux-Kernel)
  • Re: [PATCH 1/24] make atomic_read() behave consistently on alpha
    ... If you need to guarantee that the value is written to memory at a particular time in your execution sequence, you either have to read it from memory to force the compiler to store it first (and a volatile cast in atomic_read will suffice for this) or you have to use LOCK_PREFIX instructions which will invalidate remote cache lines containing the same variable. ... Communicating both with interrupt handler and with other CPUs. ...
    (Linux-Kernel)