Re: Larkin, Power BASIC cannot be THAT good:



Jasen Betts wrote:
On 2009-05-15, John Devereux <john@xxxxxxxxxxxxxx> wrote:
Jan Panteltje <pNaonStpealmtje@xxxxxxxxx> writes:

Forget my remark about -O4, runs the same with your C code as -O3, about 7 seconds on my eeePC.
I suspect this is all limited by the system memory bandwidth.

It looks like it, with -O4 optimisiation on dual-threaded and single-threaded
versions both run in 1.9 seconds here.

Yes. It is bandwidth limited on the external ram because of the way it hits sequential locations once per loop. Using DDR2 ram execution is roughly 2s and older DDR ram is about 3s in execution. The absolutely dire code generated by debug mode with all optimisation disabled is only 4s. So it isn't really a test of PowerBasics code generation at all.

The real test of an optimising compiler is how well it can exploit cache aware alogrithms with multiple nested loops. A cache aware version of this vector add and accumulate (without using SIMD instructions) is about 15% faster than the simple loop on a good optimising compiler (actually that is measured on MSC I haven't checked its code generation for optimisation - too tedious). I suspect gcc might be better.

The optimisation totally disabled version compiled by MSC is:

for (i = 0; i < ARRAY_SIZE; i++) {
004010BE mov dword ptr [i],0
004010C5 jmp main+0D0h (4010D0h)
004010C7 mov ecx,dword ptr [i]
004010CA add ecx,1
004010CD mov dword ptr [i],ecx
004010D0 cmp dword ptr [i],3D09000h
004010D7 jge main+0F7h (4010F7h)
s[i] += a[i];
004010D9 mov edx,dword ptr [i]
004010DC mov eax,dword ptr [a]
004010DF movsx ecx,word ptr [eax+edx*2]
004010E3 mov edx,dword ptr [i]
004010E6 mov eax,dword ptr [s]
004010E9 add ecx,dword ptr [eax+edx*4]
004010EC mov edx,dword ptr [i]
004010EF mov eax,dword ptr [s]
004010F2 mov dword ptr [eax+edx*4],ecx
}
004010F5 jmp main+0C7h (4010C7h)

And takes 4s on P4 3GHz with DDR ram.
Optimiser turns it into something like

mov esi, a
mov edi, s
mov ecx, ARRAY_SIZE
xor edx, edx
forloop:
movsx eax, word ptr[esi+2*edx]
add dword ptr[edi+4*edx], eax
inc edx
loop forloop

And this extremely tight optimised loop code still takes 3s with DDR ram and just under 2s with DDR2 ram.

MSC has changed the way it does this. My oldest compiler generates a pointer implementation for array access whereas newer compilers exploit the additional scaled indexing features of later CPUs. The oldest compiler generates very slightly faster code for the loop (at least it does after a manual reordering of the generated instructions).

forloop:
movsx eax,word ptr[esi]
add esi, 2
add dword ptr[edi], eax
add edi, 4
loop forloop

Code snippets above subject to typos.
Be interested to see what code gcc -O4 generates.

Regards,
Martin Brown
.



Relevant Pages

  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... it's a for loop in the C sense. ... > sloppy thinking that results from confusing a programming language ... >> I do not believe that you are capable of writing a conforming C compiler. ... Does Microsoft's C compiler perform this optimisation? ...
    (comp.programming)
  • Re: GNUH8 mixed C and assembly
    ... Using a volatile variable forces the compiler to include the loop and seems also to make the loop timing invariant with optimisation level. ...
    (comp.arch.embedded)
  • Re: A note on personal corruption as a result of using C
    ... the compiler to optimize. ... Please post a pointer to an article where somebody was "assaulted" for using an optimisable invariant in a for loop. ... Hardly "assault". ... I would expect it to be optimised any time I select optimisation by the compiler and it's optimisable. ...
    (comp.programming)
  • Re: Microsoft agrees with "hoisting loop invariants" by programmers!!! News @ 11
    ... These loop invariants ... >> Removing or hoisting loop invariants generally improves speed without ... > that they're going to feed to an optimizing compiler anyway. ... especially true of compilers with poor optimisation. ...
    (comp.programming)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... >> both less efficient and less safe than the Fortran and Basic standard. ... >> The C for loop is actually trying to do what a do loop does. ... sloppy thinking that results from confusing a programming language ... > I do not believe that you are capable of writing a conforming C compiler. ...
    (comp.programming)

Quantcast