Re: Processor question
- From: Martin Brown <|||newspam|||@nezumi.demon.co.uk>
- Date: Fri, 23 May 2008 09:39:42 +0100
Tim Williams wrote:
"Martin Brown" <|||newspam|||@nezumi.demon.co.uk> wrote in message
news:8d715$4836739d$11204@xxxxxxxxxxxxxxxxxxxx
You have chosen a poor implementation of a language that does not
compile to native code.
Sure it does. Would you like an MS-DOS executable that runs alone? I have
many. It makes 8086 code, since the compiler is copyright 1985...
I'm sure the compiler is awfully naieve though, putting pieces together.
And therein lies the problem. Count exactly how many instructions it has to execute to get around the loop once. That will give you a rough idea.
RDTSC will give you a better measurement of timing.
Its slowness is down to the very long winded way
the code is executing. The cache is probably almost completely
irrelevant here. You are not striding through large chunks of memory to
execute an empty for loop.
Well to be completely specific, I looked into it, and it seems to run a
general loop, holding the long (32 bit) integer in some memory location,
and making a far call (pushing values onto the stack) to compare the
variable to the constant. Now if far calls don't cost much, I would expect
this to run maybe 20 times slower than the most optimized loop I can
concieve of, but we're talking several orders of magnitude here.
The most optimised loop I can think of is a single LOOP instruction with 32 bit register CX containing the loop variable. Some optimising compilers will generate that on a good day.
Old x86 code has to compute 32 bit operations as two 16 bit native code ops so it will be slower. That is a big overhead.
You are looking in entirely the wrong place if you want to make your
program faster find a different native code compiler, preferably one
with a decent optimiser.
Indeed. But if you may recall, optimization wasn't my question, it was
much more general, which is why I asked here.
Cache structure really matters when you are handling bulk data that is large compared to the cache size(s) of the processor.
A modern CPU will cache lines of typically 16 bytes on instruction fetch which means that small loops fit into instructon cache on their first execution and stay there for the duration of the loop.
Regards,
Martin Brown
** Posted from http://www.teranews.com **
.
- Follow-Ups:
- Re: Processor question
- From: Phil Hobbs
- Re: Processor question
- From: Tim Williams
- Re: Processor question
- References:
- Processor question
- From: Tim Williams
- Re: Processor question
- From: James Waldby
- Re: Processor question
- From: Tim Williams
- Re: Processor question
- From: Martin Brown
- Re: Processor question
- From: Tim Williams
- Processor question
- Prev by Date: Diff between Switching reg and ldo
- Next by Date: Re: RF Voltage Measurements
- Previous by thread: Re: Processor question
- Next by thread: Re: Processor question
- Index(es):
Relevant Pages
|