Re: Numerics: Visual C++ vs. g++



On May 22, 6:08 pm, user923005 <dcor...@xxxxxxxxx> wrote:
On May 22, 2:16 am, Evgenii Rudnyi <use...@xxxxxxxxx> wrote:
...
The C++ version is performing vector allocations.  It is not the same
as your other versions which put the data on the stack.

This is true but memory allocation happens only once. To allocate
three arrays should not take too much time. So the difference should
be very small.

P.S.
If you make them static arrays, you won't need such an awful stack.
Since the size is not dynamic, static arrays make sense here (unless
you want to compare other sizes in which case you should use malloc()
for C).

You are right. Static arrays would be simpler. But I guess this should
not affect the performance anyway.

My G++ performance is not like yours.  Here are my timings on 2.2 GHz
AMD running Windows 2003 (32 bit OS):

Your makefile, but CXXFLAG = -s -O3 -DUSECLOCK:
C:\math\matmul>direct-cc.exe      1000 1000 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 9.609 s

Microsoft Visual C++ with flags:
/Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
"USECLOCK" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MT /Zp16 /GS- /
arch:SSE /Fo"Release\\" /Fd"Release\vc80.pdb" /W4 /nologo /c /Wp64 /
Zi /TP /errorReport:prompt
C:\math\matmul>direct-noprof.exe  1000 1000 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 7.391 s

Thanks a lot. I will try these flags. I thought that -O2 includes
everything but it seems not to be the case.

As above, with profile guided optimization:
C:\math\matmul>direct-profile.exe 1000 1000 1000
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 6.891 s

These use your makefile without changes, but I used gfortran and not
g77:
C:\math\matmul>direct1-c.exe
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 5.125000 s

C:\math\matmul>direct1-f.exe
 time for C(        1000 ,        1000 ) = A(        1000 ,
1000 ) B(        1000 ,        1000 ) is   12.640625      s

C:\math\matmul>direct2-c.exe
time for C(1000,1000) = A(1000,1000) B(1000,1000) is 5.172000 s

C:\math\matmul>direct2-f.exe
 time for C(        1000 ,        1000 ) = A(        1000 ,
1000 ) B(        1000 ,        1000 ) is   5.1406250      s

Here I re-ran the fortran tests with g95:
g95 -s -O3 direct1.f -o direct1-f.exe
direct1-f.exe
 time for C( 1000 , 1000 ) = A( 1000 , 1000 ) B( 1000 , 1000 ) is
13.140625  s
g95 -s -O3 direct2.f -o direct2-f.exe
direct2-f.exe
 time for C( 1000 , 1000 ) = A( 1000 , 1000 ) B( 1000 , 1000 ) is
5.109375  s

It takes 2 seconds on that same machine to do a 1000x1000 C++ matrix
multiply using Strassen multiplication.

Thank you for the suggestion. I guess that if you call DGEMM at your
computer from ATLAS or other optimized BLAS, you should have less than
one second.

My main goal here was just to see how the compiler optimizes the loops.
.



Relevant Pages

  • Re: ALLOCATABLE arrays
    ... || If try to create a very large array on the stack and you do not have enough ... || Allocating on the heap gives you access to a hell of a lot more memory (well ... and "heap" (and there probably are/were some computers which don't/didn't ... | Automatic arrays are always allocated on the stack. ...
    (comp.lang.fortran)
  • Re: Fortran memory allocation (stack/heap) issues
    ... > rather than Fortran, ... dynamic allocation, and relatively little stack allocation. ... value return and arrays by reference. ...
    (comp.lang.fortran)
  • Re: Need some help understanding array definitions
    ... Data structures defined with VARIABLE, CREATE, VALUE, CONSTANT, and related words are, indeed, all global. ... Unlike some languages, Forth doesn't discourage defining global data structures, but it's important to understand their proper use. ... They provide for "persistent" data, as well as space for strings and arrays. ... Strings and arrays should be in defined data structures and referenced by address and length or address on the stack. ...
    (comp.lang.forth)
  • Re: heap allocation of arrays
    ... | to force all arrays to be allocated on the heap. ... | the stack would be replaced with pointers on the stack. ... | heap is easier to detect than failure to allocate space on the ...
    (comp.lang.fortran)
  • Re: ALLOCATABLE arrays
    ... > If try to create a very large array on the stack and you do not have enough ... > Allocating on the heap gives you access to a hell of a lot more memory (well ... Allocatable arrays are allocated on the heap. ... Automatic arrays are always allocated on the stack. ...
    (comp.lang.fortran)