Re: a dozen cpu's on a chip



On May 10, 10:50 am, Phil Hobbs <pcdhSpamMeSensel...@xxxxxxxxxxxx>
wrote:
[.....]

One thing that I think has become clear is that huge interconnect
bandwidth is the key to broadening the range of problems that run well
on highly parallel machines. Maintaining the illusion of shared memory
at the OS level requires cache coherency across the whole machine (or a
reasonable facsimile). This leads to an interconnect bandwidth trend
that goes as the square or the cube of Moore's law, and that is starting
to dominate the power budget of large machines. The cost of maintaining
that trend will become prohibitive, unless we come up with some really
different approaches from the ones we've been using.

I think in many cases the illusion of a single large shared memory is
not needed to implement the needed operations. If we can remove the
need for it, we don't need the huge bandwidth on the interconnect.

If we assume a Harvard like processor where the code space is never
written to, the code space of each CPU can be independant during most
of the run time. This means that no transactions of the slave CPU can
ever require that the code space be brought back into sync.

The stack space and local variables of the routines running on the
slave CPUs are private to the task. Only the defined inputs and
outputs of the task really need to be shared with others. If we
assume that memory is reasonably low cost, we can have a block of
memory passed between processes to carry the information from task to
task. This makes resyncing the caches a lot easier. The source CPU's
cashe's "dirty" flags can be used to indicate what needs to be copied
over.

If the source CPU is forbidden from overwritting the output data, then
the transfer logic is reasonably simple. The source CPU's dirty
flags just copy into the recieving CPU's "out of date" flags.

This sort of processor would mean that programmers would draw data
flow diagrams and not flow charts. It would also push towards
thinking like you are coding in APL or octave and not like fortran or
C. Lots of things would be done to arrays.






.


Loading