Re: Share Your Experience with 3DNow, SSE, SSE2 etc.



aruzinsky schrieb:
It is my experience that RAM access and not computation speed is
usually the bottleneck and typically prefetches are needed before a
speed improvement over 10% is seen over 87 code. I use inline asm
code in Visual C++. I have an Athlon 64 with 64 KB L1 and 512 KB L2
caches. If I optimize my prefetches by trial and error for my
computer under Windows XP, how close to optimum will my prefetches be
on other PCs?

Also as optimal, given other PCs have the same CPU model,
memory module types and bank allocation. For other CPUs,
these prefetches may improve or degrade performance.

My personal experience for image processing (P4, K8, Core2)
is that performance gains through prefetching are non-existant
or not worth the effort and not consistant over different systems.
Though this was code which had quite predicable memory access
patterns.

Imo there are other areas to get performance improvement:
better algorithms, reordering operations to improve memory
access locality and vectorization (SSE)


Hendrik vdH
.



Relevant Pages

  • Share Your Experience with 3DNow, SSE, SSE2 etc.
    ... usually the bottleneck and typically prefetches are needed before a ... speed improvement over 10% is seen over 87 code. ... If I optimize my prefetches by trial and error for my ... Anyone know of a source for SSE3 inline macros for Visual C++ similar ...
    (sci.image.processing)
  • Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
    ...  If I optimize my prefetches by trial and error for my ... Though this was code which had quite predicable memory access ... access locality and vectorization (SSE) ... writing to memory whereas the SSE2 version uses movntpd to write to ...
    (sci.image.processing)
  • Re: Variable confidence/urgency prefetch?
    ... US3 and US4 have only weak prefetches ... Write Hint 64bytes or PPC's Data Cache Block Zero (or Allocate). ... It seems that some memory ...
    (comp.arch)
  • Re: Variable confidence/urgency prefetch?
    ... US3 and US4 have only weak prefetches ... Write Hint 64bytes or PPC's Data Cache Block Zero (or Allocate). ... It seems that some memory ...
    (comp.arch)