Re: Adjusting PC Hyperthreading for Spice Simulation



On Mon, 02 Feb 2009 20:48:30 -0800, JosephKK wrote:

If so, worst case would be 11 memory clocks, with 10 CPU clocks per
memory clock, 3 instructions (or 3 cycles' worth of instructions) per CPU
cycle = 330 cycles.

Where in Finnegans fictional fantasies did you get this weird arithmetic?
Where did the 3 instructions come from?

It's called "superscalar"; I thought that you understood this concept.
PentiumPro and later can execute multiple instructions concurrently,
commencing and completing up to 3 instructions per clock cycle.

I understand it very well, PPro does not have the capability. It only
began to appear in P4s. Moreover, many of its supposed attributes are
seriously mimicked by pipelining, which PPro and P2 and above do have.
True superscalar requires multiple execution units, which only P4 and
later have (other architectures had it earlier). SPARC is the only
single chip architecture that got past 3 execution units that i know
of (IBM 3090 did 4). Intel figured that full cores were actually
easier to implement, it seem they are right.

Okay, so we're arguing over terminology again. Intel considers pipelining
to be what the original Pentium had, calling PPro upwards superscaler.
Even without multiple ALUs, PPro upwards can execute multiple load/store
operations concurrently, alongside one integer and one FP operation.

Where do you get 10 CPU clocks per memory clock?

1100MHz CPU with 100MHz FSB, 1400MHz CPU with 133MHz FSB. Odd that you
didn't take issue with it a few messages back.

By the time i bothered to check, the issue had shifted, by the time
1100 MHz to 1400 MHz CPU cores had appeared we had DDR 333 ram.

Provide references for your data points as i have.

I did:

http://processorfinder.intel.com/details.aspx?sSpec=SL5XL

CPU Speed: 1.40 GHz
Bus Speed: 133 MHz
Bus/Core Ratio: 10.5

http://processorfinder.intel.com/details.aspx?sSpec=SL4BR

CPU Speed: 1 GHz
Bus Speed: 100 MHz
Bus/Core Ratio: 10

DDR 333 may have existed at this point, but so did CPUs with 100/133 FSB.

While it is possible to write pathological code in assembler, higher level
languages will generally prevent it. It may be possible to brute force
"C" in this way, but it will readily recognizable as pathological.

That's not even remotely true. Any code which performs simple calculations
on large amounts of data is inherently memory bound (i.e. there is
always an outstanding transfer).

The most obvious case of code with poor cache coherence is OO code where
an abstract base class has many subclasses.

For a concrete example, a 3D game engine will typically have abstract
"brush" and "actor" classes, the first representing immutable
terrain (walls, floors), the second representing dynamic entities
(enemies, weapons, ordnance, other mutable objects, ...).

Updating the game state involves iterating over a set of actors, but the
code executed for each one depends upon the final class (updating a zombie
is quite different from updating a bullet). You can realistically end up
calling over 100 distinct update methods for a single frame.

Rendering is similar, although there a fewer distinct methods (but the
number is continually increasing with the use of specialised shaders,
procedural textures etc) but more data (you have to render both terrain
and actors, but terrain doesn't need updating).

Sorry, i don't care much about inefficiently designed/implemented
games, nor am i gamer. That type are hoist on their own petard.

Oh; so it's the programmers' fault for not writing megabytes of code in
hand-tuned assembler?

Real-world code doesn't look anything like the Fortran or Pascal examples
you may have learned in college, or the kind of code you would write for a
microcontroller.

Do you have any other type of benchmark data?

So you can dismiss that too (along with anything else which contradicts
your claims) as a "pathological" case.

But there's no point in citing specific examples. Just download any
substantial software package for which source code is available
(especially anything written in C++).

If you're programming x86 (i.e. PCs/servers), software where 99% of the
CPU cycles are spent in a few KiB of code is the exception rather than the
rule.

.