Share Your Experience with 3DNow, SSE, SSE2 etc.



It is my experience that RAM access and not computation speed is
usually the bottleneck and typically prefetches are needed before a
speed improvement over 10% is seen over 87 code. I use inline asm
code in Visual C++. I have an Athlon 64 with 64 KB L1 and 512 KB L2
caches. If I optimize my prefetches by trial and error for my
computer under Windows XP, how close to optimum will my prefetches be
on other PCs?

What about exception masking?

Anyone know of a source for SSE3 inline macros for Visual C++ similar
to the 3DNow macros provided by AMD in AMD3DX.h ?
.