Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- From: Hendrik van der Heijden <hvdh@xxxxxx>
- Date: Thu, 07 Aug 2008 19:37:47 +0200
aruzinsky schrieb:
$L2:
movaps xmm0, [edx]
movaps xmm1, [edx+16]
movaps xmm2, [edx+32]
movaps xmm3, [edx+48]
PREFETCHNTA [edx+ecx]
movntpd [eax], xmm0
movntpd [eax+16], xmm1
movntpd [eax+32], xmm2
movntpd [eax+48], xmm3
add edx, 64
add eax, 64
dec esi
jnz $L2
Prefetching here doesn't gain much, as the RAM access pattern
is easily predictable by the hardware prefetcher.
I tried several things on my Core2.
For arrays larger than the caches:
- nontemporal writes give 33% speedup
- prefetching makes no difference
- unroll by 4 vs not unrolled makes no difference
For array which fit in L2 (copy the same block 100 times):
- nontemporal writes reduce performance to 27%
- prefetching makes no difference
- unroll by 4 vs not unrolled makes no difference
Hendrik vdH
.
- References:
- Share Your Experience with 3DNow, SSE, SSE2 etc.
- From: aruzinsky
- Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- From: Hendrik van der Heijden
- Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- From: aruzinsky
- Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- From: Hendrik van der Heijden
- Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- From: aruzinsky
- Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- From: Hendrik van der Heijden
- Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- From: aruzinsky
- Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- From: aruzinsky
- Share Your Experience with 3DNow, SSE, SSE2 etc.
- Prev by Date: Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- Next by Date: Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- Previous by thread: Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- Next by thread: Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
- Index(es):
Relevant Pages
|