Re: Numerics: Visual C++ vs. g++
- From: Evgenii Rudnyi <usenet@xxxxxxxxx>
- Date: Thu, 22 May 2008 11:53:40 -0700 (PDT)
On May 22, 6:08 pm, user923005 <dcor...@xxxxxxxxx> wrote:
On May 22, 2:16 am, Evgenii Rudnyi <use...@xxxxxxxxx> wrote:...
The C++ version is performing vector allocations. It is not the same
as your other versions which put the data on the stack.
This is true but memory allocation happens only once. To allocate
three arrays should not take too much time. So the difference should
be very small.
P.S.
If you make them static arrays, you won't need such an awful stack.
Since the size is not dynamic, static arrays make sense here (unless
you want to compare other sizes in which case you should use malloc()
for C).
You are right. Static arrays would be simpler. But I guess this should
not affect the performance anyway.
My G++ performance is not like yours. Here are my timings on 2.2 GHz
AMD running Windows 2003 (32 bit OS):
Your makefile, but CXXFLAG = -s -O3 -DUSECLOCK:
C:\math\matmul>direct-cc.exe 1000 1000 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 9.609 s
Microsoft Visual C++ with flags:
/Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
"USECLOCK" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MT /Zp16 /GS- /
arch:SSE /Fo"Release\\" /Fd"Release\vc80.pdb" /W4 /nologo /c /Wp64 /
Zi /TP /errorReport:prompt
C:\math\matmul>direct-noprof.exe 1000 1000 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 7.391 s
Thanks a lot. I will try these flags. I thought that -O2 includes
everything but it seems not to be the case.
As above, with profile guided optimization:
C:\math\matmul>direct-profile.exe 1000 1000 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 6.891 s
These use your makefile without changes, but I used gfortran and not
g77:
C:\math\matmul>direct1-c.exe
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 5.125000 s
C:\math\matmul>direct1-f.exe
time for C( 1000 , 1000 ) = A( 1000 ,
1000 ) B( 1000 , 1000 ) is 12.640625 s
C:\math\matmul>direct2-c.exe
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 5.172000 s
C:\math\matmul>direct2-f.exe
time for C( 1000 , 1000 ) = A( 1000 ,
1000 ) B( 1000 , 1000 ) is 5.1406250 s
Here I re-ran the fortran tests with g95:
g95 -s -O3 direct1.f -o direct1-f.exe
direct1-f.exe
time for C( 1000 , 1000 ) = A( 1000 , 1000 ) B( 1000 , 1000 ) is
13.140625 s
g95 -s -O3 direct2.f -o direct2-f.exe
direct2-f.exe
time for C( 1000 , 1000 ) = A( 1000 , 1000 ) B( 1000 , 1000 ) is
5.109375 s
It takes 2 seconds on that same machine to do a 1000x1000 C++ matrix
multiply using Strassen multiplication.
Thank you for the suggestion. I guess that if you call DGEMM at your
computer from ATLAS or other optimized BLAS, you should have less than
one second.
My main goal here was just to see how the compiler optimizes the loops.
.
- References:
- Numerics: Visual C++ vs. g++
- From: Evgenii Rudnyi
- Re: Numerics: Visual C++ vs. g++
- From: user923005
- Numerics: Visual C++ vs. g++
- Prev by Date: Islamic torrents
- Next by Date: Re: Numerics: Visual C++ vs. g++
- Previous by thread: Re: Numerics: Visual C++ vs. g++
- Next by thread: Re: Numerics: Visual C++ vs. g++
- Index(es):
Relevant Pages
|
|