Re: Adjusting PC Hyperthreading for Spice Simulation
- From: JosephKK <quiettechblue@xxxxxxxxx>
- Date: Sat, 31 Jan 2009 19:30:18 -0800
On Sat, 31 Jan 2009 11:11:45 +0000, Nobody <nobody@xxxxxxxxxxx> wrote:
On Wed, 28 Jan 2009 18:48:09 -0800, JosephKK wrote:
Taking 1999 as a useful base year lets look at processors:
So in 1999 Intel (IA-32) CPU speed was about 500 MHz And memory speed
was about 133 MHz. That is not 300:1, it is not even 10:1 and it took
multiple clocks to execute most instructions (early Pentium, not even
P2).
Pentium 3.
Pentium Pro only went up to 200MHz, Pentium 2 up to 450MHz. Pentium 3 came
out in 1999, at 450/500MHz.
At the other end of the scale, my P3/800 used PC-133, and there were P3s
up to 1100MHz with a 100MHz FSB and 1400MHz with a 133MHz FSB. That's the
kind of system where 300 clocks is feasible for a code cache miss.
By the way, why did you snip away my references?
This will set up some time line referents to work with:
http://www.dewassoc.com/performance/memory/how_to_id_pc133.htm
Taking 1999 as a useful base year lets look at processors:
http://www.pdfdownload.org/pdf2html/pdf2html.php?url=http%3A%2F%2Fwww.connellybarnes.com%2Fdocuments%2Fcpu_speed.pdf&images=yes
Just the same, even with your unsupported values:
Lets see, even about 10 to 1 clock speed difference cannot translate
into over 100 to 1 time cost.
.
After that, DDR appeared, and memory finally started to catch up with the
CPU. But prior to that, you had
No, you were acting dumb about the missing instruction. I explained
the only case it could occur and you disunderstood. Even with
branches instruction fetch is fundamentally sequential, you cannot
fetch instruction x+1 until you have fetched instruction x.
You can displace instruction x from the cache without displacing
instruction x+1.
Possible in some cache schemes, not in most.
Possible in any cache scheme. If instruction x+1 is a branch target, it
can be both more recently used and more used recently than instruction x.
You're assuming either very few branches (or indirect jumps), or very
accurate prediction. That may be true if you're writing Fortran to
evaluate an algebraic formula, but it's not true in general.
From the profiler; about 3% to 10% branches, with about 75% successful
prediction.
75% isn't "very accurate"; and for getting two consecutive branches
correct, that's only 56%. For code with a lot of conditionals, you can't
assume that prefetching is going ensure that the right instructions are
cached.
Changes very little. Back to back branches are uncommon to rare.
That depends upon the type of code you're writing. Obviously, branches
which are "exactly" back-to-back are rare, but test,branch,test,branch
isn't that uncommon; an extreme case is code which embodies a domain of
knowledge, classifying its input then applying the corresponding rules
(IOW, something akin to a Lisp "cond" statement, except that you would
normally try to use a hierarchical decision tree rather than performing
the tests sequentially).
No, I'm explaining what modern interpreted languages are really like. If
you're going to write in BASIC, you may as well compile it. The main
reason for using interpreted languages is the flexibility provided by
dynamic dispatch.
Have you ever pounded your way through the lowest level code of an
interpreter? They emulate a nonexistent virtual machine. The issues
related to dynamic dispatch in compiled code do not obtain in
interpreted code. The process is that different.
At the lowest level, it is examining each instruction and invoking the
corresponding primitive. But this isn't like emulating a real CPU where
you have maybe a couple of dozen common instructions; the Python core
is around 2500 primitives in 750KB of code. When the VM implementation is
hopping all over that much code, you aren't going to have it all in the
cache.
Code locality is a function of most used primitives as well, thus the
less common primitives cause most of the cache misses and the
commonest primitives are almost always in cache.
That's fine if you have a handful of common primitives and the rest are
rare, but glancing over Python's primitives, I'd say that fully half of
them are common. The kind of code which would only use a handful of
primitives is the kind of code which you would write in C.
- Follow-Ups:
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: Nobody
- Re: Adjusting PC Hyperthreading for Spice Simulation
- Prev by Date: Re: STP24DP05
- Next by Date: Re: STP24DP05
- Previous by thread: Re: According to Tek, Australia doesn't exist...
- Next by thread: Re: Adjusting PC Hyperthreading for Spice Simulation
- Index(es):