Re: Adjusting PC Hyperthreading for Spice Simulation
- From: JosephKK <quiettechblue@xxxxxxxxx>
- Date: Tue, 27 Jan 2009 18:43:52 -0800
On Mon, 26 Jan 2009 20:19:25 -0800, JosephKK <quiettechblue@xxxxxxxxx>
wrote:
On Mon, 26 Jan 2009 08:10:53 -0600, krw <krw@xxxxxxxxxxxxx> wrote:
In article <o7bqn49n08t3cor6nj2gtc29g311e4j0qe@xxxxxxx>,
quiettechblue@xxxxxxxxx says...>
On Sun, 25 Jan 2009 13:29:39 -0600, krw <krw@xxxxxxxxxxxxxxxxx> wrote:
On Sun, 25 Jan 2009 10:59:31 -0800, JosephKK <quiettechblue@xxxxxxxxx>
wrote:
On Sun, 25 Jan 2009 00:06:57 +0000, Nobody <nobody@xxxxxxxxxxx> wrote:
On Sat, 24 Jan 2009 19:39:32 +0000, Nobody wrote:
In other words... you get 1 billion operations per second (or whatever).
Hyperthreaded CPUs just give the appearance of two CPUs so that if a
particular thread is waiting on, e.g., a memory read from DRAM (this
can take hundreds of cycles)
Memory access taking hundreds of cycles? Hell not even a dozen.
It depends how fast your RAM is. At one point (I guess around 5 years
ago), 350 CPU cycles for a code cache miss was not atypical, but RAM
speed has been consistently increasing faster than CPU speed for the
last few years.
To remove a possible source of confusion: cycle "costs" take into account
the fact that each core can execute multiple instructions concurrently
(superscalar architecture). So a cost of e.g. "100 cycles" refers to a
delay in which a sequence of instructions totalling 100 cycles could be
executed, not 100 times the CPU clock period.
So you have heard of pipeline bubbling. The pipelines are not that
deep, about 7 stages max due to complexity increases.
Depends on the processor. The G5 and P4 were significantly deeper
than that (more like 20 stages). The entire pipe is flushed on a
mispredicted branch or context switch. If the target isn't in the
cache it has to be reloaded from main memory.
Not so on mispredicted branches. Moreover speculative execution of
both sides almost eliminates the issue. Also that may have been that
much total depth but less than 3% of instructions (and much less than
1 % of execution) need all of them, mostly things like pusha and popa
which move multiple registers onto and off of the stack.
If the branch target misses the cache and a new DRAM page has to be
opened, yes it does. Branches don't do PUSHA/POPA. Memory access
is still 100x CPU clock.
I am amazed at how badly you misread this.
Here is a typical manufacturer's website discussing DDR2 memory.
Oops. forgot to add the link:
www.samsung.com/global/business/semiconductor/products/dram/downloads/ddr2_device_operation_timing_diagram_may_07.pdf
The very link dates it to May 2007.
Moreover check this, even though it is Wikipedia:
http://en.wikipedia.org/wiki/DDR2_SDRAM
Take a look at those cycle times and peak transfer rates. Sustained
rates are maybe 20% of peak but still quite a bit of data moving.
Please note the availability rather comprehensive timing diagrams for.
speeds up to DDR2-800. If you understand this manufacturers
literature correctly the total read latency is on the order of 20 ns
(worst case). For the processor to be 100 x faster cycle time would
have to be less than 200 ps, which would be equivalent to about a 20
GHz clock. Current parts are about 1.5 to 3 GHz clocks. Best
possible speed ratio in CPU favor 20 to 1.
Current and recent processors (about 5 years for x86, more for SPARC
and others) support speculative execution and out of order execution
to reduce this problem.
It doesn't reduce the problem, rather makes it occur less often (when
the planets line up). The "100 cycles" is still there. Memory with a
100ns access and a 1GHz CPU kinda makes access 100x clock.
Wow, the last time i saw ram with 100 ns access times was back in the
386 days. Even then you could get 70 ns and 60 ns premium parts.
Current stuff is like 12 ns to 15 ns access and 60 ns to 85 ns cycle
times with multiple consecutive address available at 5 ns intervals.
Try measuring apples to apples (access a closed page). You'll find
that shiny new memory isn't all that much faster than that of
twenty years ago. Current processors aren't 1GHz, either. The
ratio is still ~100:1.
What do you mean by a closed page?
And current memory is not 60 ns any more either, average effective
access when used correctly is more like 6 ns.
The only place you get killed is on cache writeback block outs, that
does have 100 ns plus lags before reading the new data (but that does
not apply to instruction caches).
Huh? I'm missing your point here. Cache castouts aren't be in the
performance path.
The issue is dirty cache page write back (data segments) in order to
load a new page.
- References:
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: D from BC
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: Joel Koltner
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: JosephKK
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: Nobody
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: Nobody
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: JosephKK
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: krw
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: JosephKK
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: krw
- Re: Adjusting PC Hyperthreading for Spice Simulation
- From: JosephKK
- Re: Adjusting PC Hyperthreading for Spice Simulation
- Prev by Date: Re: Multiple power strips daisy-chained, code?
- Next by Date: Re: Voltage Divider
- Previous by thread: Re: Adjusting PC Hyperthreading for Spice Simulation
- Next by thread: Re: Adjusting PC Hyperthreading for Spice Simulation
- Index(es):