Re: a dozen cpu's on a chip
- From: Robert Baer <robertbaer@xxxxxxxxxxxx>
- Date: Mon, 12 May 2008 02:01:59 -0700
Phil Hobbs wrote:
MooseFET wrote:Basically the same problem way back in time-sharing daze; one CPU trying to handle N customers; when N got too large (usually greater than 12 then), ther was not sufficent bus bandwidth as well as sufficent time to handle any one of them in a decent time - so everyone got bogged down.
On May 8, 8:27 am, John Larkin
<jjlar...@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
On Thu, 8 May 2008 07:42:04 -0700 (PDT), MooseFET <kensm...@xxxxxxxxx>
wrote:
On May 7, 7:48 pm, John Larkin
<jjlar...@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
http://www.eetimes.com/news/latest/showArticle.jhtml;jsessionid=CESEX...
I bet we'll see 256 one of these days.
When you get to large numbers of CPUs it seems to make sense to stop
making them identical. For servers this would be doubly so. Many of
the CPUs won't need to do floating point operations.
Right. Amybe a few cpu's would have serious floating point power, or a
few separate fp engines could be assigned to cpu's as needed. Lots of
cpu's, doing stuff like file i/o or serial stuff, could be less
powerful. I suppose we'll always need special graphics hardware, but
just a few of those per chip.
It could go even further. You could have a situation where the "boss"
integer only CPU does this:
Dear Mr Floating processor #1: Please go perform the code at the
following address.
Hey Byte slinger processor #7: Go make this memory move.
Hey I/O processor #3: go do this work.
.... etc ....
Early in the era of the 8086 there was an 8089 which was called a DMA
processor even though it really was programmed I/O in its own
instruction set. It could do I/O way faster than the 8086. If a CPU
is intended to be part of a server, it could have parts like that in
it for doing the things needed for fast disk operations.
It also would make sense to do things like memory moves in the "Memory
Mismanagement Unit" since the values don't need to be modified on the
way through.
This will make it a lot harder to say how many CPUs are in a chip. If
there is only as much hardware as 200 full CPUs but 500 threads can be
running at the same time, do you call it 200 or 500 CPUs.
Next step is to get rid of task swapping and threads altogether. One
CPU is the OS, and one cpu gets assigned per process.
Actually I can see multiple CPUs being the OS. When you have a lot of
tasks to manage, it takes some processing power just to manage them.
Future machines with thousands of CPUs may need tens of managers.
John
Those specialized communications processors have been used in large systems for ages, and they're getting more important with time, as you suggest.
IBM has made 256-way SMPs for years, at varying levels of integration. SMPs cost much more than loosely-coupled machines, but there are good commercial reasons to use them. Keeping the illusion of symmetric shared memory really simplifies the programming model--a hugely important issue that non-programmers usually have no idea about. (If anyone figures out an efficient way to parallellize queries in large databases using loosely-coupled processors, I could be out of work. It isn't something I worry about much.)
Ever since Danny Hillis & Co. back in the 80s, people have been pushing one sort or another of massively parallel machine. They've been perfectly right all along, too--for certain classes of problems, massively parallel is the way to go. The problem has been that not too many problems of economic importance have fallen into that 'certain class'--which is why Hillis's Thinking Machines Inc. and many others have come and gone. Nobody knew how to do business apps efficiently on those machines then, and nobody knows now either, as far as I can tell.
One thing that I think has become clear is that huge interconnect bandwidth is the key to broadening the range of problems that run well on highly parallel machines. Maintaining the illusion of shared memory at the OS level requires cache coherency across the whole machine (or a reasonable facsimile). This leads to an interconnect bandwidth trend that goes as the square or the cube of Moore's law, and that is starting to dominate the power budget of large machines. The cost of maintaining that trend will become prohibitive, unless we come up with some really different approaches from the ones we've been using.
Cheers,
Phil Hobbs
Put N CPUs on a memory bus and guess what? same problem.
So, i say, support the insanity and make N as large as some idiot wants and sell the super-duper slicer-dicer, and pocket the money before the fleeced buyers get wise.
.
- Follow-Ups:
- Re: a dozen cpu's on a chip
- From: Phil Hobbs
- Re: a dozen cpu's on a chip
- From: John Larkin
- Re: a dozen cpu's on a chip
- References:
- a dozen cpu's on a chip
- From: John Larkin
- Re: a dozen cpu's on a chip
- From: MooseFET
- Re: a dozen cpu's on a chip
- From: John Larkin
- Re: a dozen cpu's on a chip
- From: MooseFET
- Re: a dozen cpu's on a chip
- From: Phil Hobbs
- a dozen cpu's on a chip
- Prev by Date: Technomarine Abyss Watches - Technomarine Watches Minimum Price
- Next by Date: Re: Flat Electrolytic Capacitors
- Previous by thread: Re: a dozen cpu's on a chip
- Next by thread: Re: a dozen cpu's on a chip
- Index(es):
Relevant Pages
|