Re: Adjusting PC Hyperthreading for Spice Simulation
- From: JosephKK <quiettechblue@xxxxxxxxx>
- Date: Sun, 25 Jan 2009 19:36:06 -0800
On Sun, 25 Jan 2009 13:29:39 -0600, krw <krw@xxxxxxxxxxxxxxxxx> wrote:
On Sun, 25 Jan 2009 10:59:31 -0800, JosephKK <quiettechblue@xxxxxxxxx>
wrote:
On Sun, 25 Jan 2009 00:06:57 +0000, Nobody <nobody@xxxxxxxxxxx> wrote:
On Sat, 24 Jan 2009 19:39:32 +0000, Nobody wrote:
In other words... you get 1 billion operations per second (or whatever).
Hyperthreaded CPUs just give the appearance of two CPUs so that if a
particular thread is waiting on, e.g., a memory read from DRAM (this
can take hundreds of cycles)
Memory access taking hundreds of cycles? Hell not even a dozen.
It depends how fast your RAM is. At one point (I guess around 5 years
ago), 350 CPU cycles for a code cache miss was not atypical, but RAM
speed has been consistently increasing faster than CPU speed for the
last few years.
To remove a possible source of confusion: cycle "costs" take into account
the fact that each core can execute multiple instructions concurrently
(superscalar architecture). So a cost of e.g. "100 cycles" refers to a
delay in which a sequence of instructions totalling 100 cycles could be
executed, not 100 times the CPU clock period.
So you have heard of pipeline bubbling. The pipelines are not that
deep, about 7 stages max due to complexity increases.
Depends on the processor. The G5 and P4 were significantly deeper
than that (more like 20 stages). The entire pipe is flushed on a
mispredicted branch or context switch. If the target isn't in the
cache it has to be reloaded from main memory.
Not so on mispredicted branches. Moreover speculative execution of
both sides almost eliminates the issue. Also that may have been that
much total depth but less than 3% of instructions (and much less than
1 % of execution) need all of them, mostly things like pusha and popa
which move multiple registers onto and off of the stack.
Current and recent processors (about 5 years for x86, more for SPARC
and others) support speculative execution and out of order execution
to reduce this problem.
It doesn't reduce the problem, rather makes it occur less often (when
the planets line up). The "100 cycles" is still there. Memory with a
100ns access and a 1GHz CPU kinda makes access 100x clock.
Wow, the last time i saw ram with 100 ns access times was back in the
386 days. Even then you could get 70 ns and 60 ns premium parts.
Current stuff is like 12 ns to 15 ns access and 60 ns to 85 ns cycle
times with multiple consecutive address available at 5 ns intervals.
The only place you get killed is on cache writeback block outs, that
does have 100 ns plus lags before reading the new data (but that does
not apply to instruction caches).
.
- Follow-Ups:
- References:
- Adjusting PC Hyperthreading for Spice Simulation
- From: D from BC
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: Helmut Sennewald
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: D from BC
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: D from BC
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: Joel Koltner
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: JosephKK
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: Nobody
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: Nobody
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: JosephKK
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: krw
- Adjusting PC Hyperthreading for Spice Simulation
- Prev by Date: Re: Adjusting PC Hyperthreading for Spice Simulation
- Next by Date: Re: Filtering a specific voice - Is this possible?
- Previous by thread: Re: Adjusting PC Hyperthreading for Spice Simulation
- Next by thread: Re: Adjusting PC Hyperthreading for Spice Simulation
- Index(es):