Re: How to develop a random number generation device



John Larkin wrote:
On Wed, 19 Sep 2007 23:06:17 +0200, David Brown
<david.brown@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

John Larkin wrote:
On Tue, 18 Sep 2007 18:15:07 +0200, David Brown
<david.brown@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

John Larkin wrote:
On Mon, 17 Sep 2007 23:04:03 +0200, David Brown
<david.brown@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

<snip>

This is not about performance; hardly anybody needs gigaflops. It's
all about reliability.
Until you can come up with some sort of justification, however vague, as to why you think one cpu per process is more reliable than context switches, this whole discussion is useless.
You define yourself by the ideas you refuse to consider. So I suppose
you'll still be running Windows 20 years from now.

I run windows (on desktops) and Linux (on a desktop, a laptop, and a bunch of servers, and on a fairly high-reliability automation system I am working on), and I'd use something else if I needed an OS in my embedded systems. If something better came along, I'd use that - whatever is the right tool for the job.

The relevant saying is "keep an open mind, but not so open that your brains fall out". I'm happy to accept that doing things in hardware is often more reliable than doing things in software (I work with small embedded systems - I know when reliability is important, and I know about achieving it in practical systems). But what I am not willing to accept is claims that you alone understand the way to make all computers reliable,

I have made no such claims.



You have repeatedly said that current OS's (software OS's running on one or a few cores) is inherently unreliable, while your idea of a massively multi-core cpu running a task per core would be totally reliable. As far as I can see, you are the only person who believes this. If I've misunderstood (either about your claims, or if you can show that others share the idea), please correct me.


using a hardware design that is obviously (to me, anyway)
impractical,

Can't help what's obvious to you


and you offer no justification beyond repeating claims that
"hardware is always more reliable than software",

Isn't it?


No it isn't. At best, you can compare apples and oranges and note that a ram chip is more reliable than windows, despite the former having more transistors than the later has lines of source code.

We agree that typical hardware design processes are more geared to producing reliable and well-tested designs than common software design processes. But that does not translate into a generalisation that a given task can be performed more reliably in hardware than software.


and therefore you can
practically guarantee that the future of computing will be dominated by single task per core processors.

I can't guarantee it. My ideas are necessarily simplistic, and would

Perhaps "guarantee" was a bit strong - but you stated confidently that your 1024-core one-core-per-task devices were "gonna happen".

get more complex in a real system. Like, for example, my multicore
chip would probably have a core+GPU or three optimized for graphics,
and maybe some crypto or compression/decompression gadgets. There's no
point sacrificing performance to intellectual purity.


This is beginning to sound a lot more like a practical system - devices exist today with several specialised cores, particularly in the embedded market. Arguably graphics cards fall into this category, as do high-end network cards with offload engines. But that's a far cry from your cpu-per-thread idea, and it is done for performance reasons - *not* reliability.

But the trend towards multiple cores, running multiple threads each,
is a steamroller. So far, it's been along the Microsoft "big OS"
model, but whan we get to scores of processors running hundreds of
threads, wouldn't a different OS design start to make sense? The IBM
Cell is certainly another direction.


Forget windows - it's a bad example of an OS, and a it's an extreme example of unreliable software. There is no "Microsoft big OS" model - they just have a bad implementation of a normal monolithic kernel OS.

There are uses for computers based on running large numbers of threads in parallel - the Sun Niagara processors can handle 64 threads in hardware (running on 8 cores). But these do not use a core (or even a virtual core) per thread - the cores have context switches as threads and processes come and go, or sleep and resume. Clearly you will get better *performance* when you can minimise context switching - but no one would plan for a system where context switching did not happen. There is nothing to suggest that the system could be made more reliable by avoiding context switches, except in the sense of reliably being able to complete tasks at the required speed - it's a performance issue.

I believe I have been open minded - I've tried to point out the problems with your ideas, and why I think it is impractical to design such chips,

Sorry, I missed that part. Why is it, or more significantly, why *will
it* be impractical to design a chip that will contain, or act like it
contains, a couple hundred CPU cores, all interfaced to a central
cache?


Perhaps I didn't explain it well, or perhaps you didn't read these posts - it's hard to follow everything on s.e.d.

The problem with so many cores accessing a shared cache is that you have huge contention for the cache resources. RAM cells get bigger, slower and more complex the more ports they have - it's rare to get more than dual-ported RAM blocks. So if you have 1000 cores all trying to access the same cache, you're going to have huge latencies. You also need complex multiplexing hierarchies for your cross-switches - as each cpu needs to access the cache, you basically require a 1000:1 multiplexer. Assuming your cache has multiple banks and access to some IO or other buses, you'd need something like a 1000:10 cross-switch. That would be really horrible to implement - you'd need to find a compromise between vast switching circuits and multiple levels introducing delays and bottlenecks.

Here's a brief view of the Niagara II - your device would face similar challenges, but greatly multiplied:
http://www.theinquirer.net/?article=42256

If each core has an L1 cache to relieve some of the pressure (without it, the system would crawl), you then have a very nasty problem of tracking cache coherency. Current cache coherency strategies do not scale well - they are a big problem on multicore systems.

With existing multiprocessor systems, it is the cache and memory interconnection systems that are the big problem. If you look at high-end motherboards with 8 or 16 sockets, the cross-bar switches that keep memory coherent and provide fast access for all the cores cost more than the processors themselves. Building it all in one device does not make it significantly easier (although it saves on some buffers).

There are alternative ways to connect up large numbers of cores - a NUMA arrangement with cores passing memory requests between each other would almost certainly be easier. But you would have very significant latencies and bottlenecks, a very large number of inter-core buses, and you'd still have trouble with the L1 cache coherence.

With a new OS, and certain significant restraints on the software, you could perhaps avoid many of the L1 cache coherence problems. In particular, being even more restrictive about memory segments would allow you to assume that L1 data is private, and thus always coherent. For example, if all memory came from either a read-only source for code, or was private to the task using it, then you'd have coherency. You'd need a system for read and write locks for memory areas, with a central controller responsible for dishing out these locks and broadcasting cache invalidations when these changed, but it might work.

However, you've lost out on a range of requirements here. First off, your cores are now far from simple, and the glue logic is immense. Thus you have lost all hope of making the device cheap and reliable. Secondly, you've still got significant latencies for all memory access, slowing down the throughput of any given core, crippling your maximum thread speed. The bottlenecks don't matter so much in the grand view of the device - the total bandwidth to the cpus should still be more than if it were a normal multi-core device. Thirdly, you've lost compatibility with all existing software - it won't run most programs, as they rely on being able to have shared data access.


and why they would be impractical for general purpose computing even if they were made.

Why? Because Windows, and other "big" OS's like Linux, don't support
it?


Yes, that's about it. To be more precise, it will be impractical for general purpose computing because it won't run common general purpose programs. Even with the required major changes to the software and compilation tools, and without the cache restrictions mentioned earlier, it would run common programs painfully slowly.

I've repeatedly asked for justification for your claims, and received none of relevance. I am more than willing to discuss these ideas more if you can justify them - but until then, I'll continue to view massively multi-core chips as useful for some specialised tasks but inappropriate for general purpose (and desktop in particular) computing.

It's generally accepted tha a microkernal-based OS will be more
reliable than a macrokernal system, because of its simplicity, but the
microkernal needs too many context switches to be efficient.


A microkernel *may* be more reliable because of its modular design - each part is relatively simple and communicates through limited, controlled ports. That's far from saying it always *will* be more reliable. Much of the theoretical reliability gains of a microkernel do not actually help in practice. For example, the ability of low-level services to be restarted if they crash is useless when the service in question is essential to the system. Thus there are no reliability benefits from putting your memory management, task management, virtual file system, or interrupt system outside the true kernel - if one of these services dies, you're buggered whether it kills the kernel or not. A similar situation is found in Linux - because X is separate from the kernel, it can die and restart independently of the OS itself. But to the desktop user, their system has died - they don't know or care if the OS itself survived.

Most of the benefits of a microkernel can actually be achieved in a monolithic kernel - you keep your services carefully modularised, developed and tested as separate units with clear and clean interfaces. It's a good development paradigm - it does not matter in practice if the key services are directly linked with the kernel or not, since they are all essential to the working of the OS. About the only way a microkernel improves reliability is by enforcing this model - you are not able to cheat.

What *does* make sense is keeping as many device drivers as possible out of the kernel itself. Non-essential services should not be in the kernel.


http://en.wikipedia.org/wiki/Microkernel

So, let's get rid of the context switches by running each process in
its own real or virtual (ie, multithreaded) CPU. Then nobody can crash
the kernal. A little hardware protection for DMA operations makes even
device drivers safe.


You underestimate the power of software bugs - you'll *always* be able to crash the kernel!

The context switches in this case are completely irrelevant to reliability. The issue with microkernels and context switches is purely a matter of performance - they cost a lot of time, especially since they involve jumps to a different processor mode or protection ring. If you want to produce a cpu that minimises the cost of a context switch through hardware acceleration, then it would definitely be a good idea and would benefit microkernel OS's in particular. But it's a performance improvement, not a reliability improvement. Other hardware for accelerating key OS concepts such as locks or IPC would help too.



I seem to remember previous discussions reaching similar conclusions - you had a pretty way-out theory, leading to an interesting discussion but ending with me giving up in frustration, and you calling me closed-minded.

Deja vu, I guess.

These sorts of ideas are good for making people think, but scientific minds are naturally sceptical until given solid evidence and justification.

"Scientific minds" are often remarkably ready to attack new ideas,
rather than playing with, or contributing to them. I take a lot of
business away from people like that.

And I'm no dreamer: I build stuff that works, and people buy it.


So do I - but we both make and sell practical solutions which are a step beyond our competitors. We would not try and sell something that seems a revolutionary new idea at first sight, but terribly impractical to implement and lacking the very benefits we first thought.

There's nothing wrong with dreaming, quite the opposite. But you have to be able to see when it is nothing but a dream.

mvh.,

David


John


.



Relevant Pages

  • Re: How to develop a random number generation device
    ... Until you can come up with some sort of justification, however vague, as to why you think one cpu per process is more reliable than context switches, this whole discussion is useless. ... I'm happy to accept that doing things in hardware is often more reliable than doing things in software (I work with small embedded systems - I know when reliability is important, and I know about achieving it in practical systems). ... You have repeatedly said that current OS's (software OS's running on one or a few cores) is inherently unreliable, while your idea of a massively multi-core cpu running a task per core would be totally reliable. ... I don't imagine we'll see so very many more cores on a device, because the interconnections and cache scaling get too difficult, and the whole device is too limited in memory bandwidth. ...
    (sci.electronics.design)
  • Re: How to develop a random number generation device
    ... embedded systems - I know when reliability is important, ... Looks like the number of cores per chip is at least ... by avoiding context switches, except in the sense of reliably being able ... The problem with so many cores accessing a shared cache is that you have ...
    (sci.electronics.design)
  • Re: How to develop a random number generation device
    ... chip, and something new will be required to manage them. ... I think that the number of virtual cores will grow faster than the ... One CPU would be the manager, ... embedded systems - I know when reliability is important, ...
    (sci.electronics.design)
  • Re: How to develop a random number generation device
    ... chip, and something new will be required to manage them. ... I think that the number of virtual cores will grow faster than the ... One CPU would be the manager, ... I'm happy to accept that doing things in hardware is often more reliable than doing things in software (I work with small embedded systems - I know when reliability is important, and I know about achieving it in practical systems). ...
    (sci.electronics.design)
  • Re: iBook hard drive replacement - redux
    ... >> hard drive replacement. ... Last I heard, brand, speed and capacity have no real bearing ... > reliability, personal choice currently is with seagate but it will cost ... Is it worth paying more for an 8MB cache? ...
    (uk.comp.sys.mac)