Re: PowerBasic rocks!
- From: James Waldby <no@xxxxx>
- Date: Fri, 15 May 2009 12:29:21 -0500
On Fri, 15 May 2009 05:23:53 -0700, panteltje wrote:
Even with your values I still have problems with this :[snip program]
When I run that on the eeePC with 512MB RAM, I get these times:
eeepc-unknown:/root> ./test2
memory needed=384 MB
mem=0xa8a54008
b=0xa1041008
Time used is 13920337 us (13.9203 s). Ready
On my system*, an example output of ./time-snipped-prog is:
memory needed=384 MB
mem=0x2aaaaaae8010
b=0x2aaab9f0d010
Time used is 2891413 us (2.8914 s).
Ready
-------------------------------------------
*Some extracts from free and per-processor /proc/cpuinfo on my system:
total used free
Mem: 3936312 3537024 399288
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
cpu MHz : 1000.000
bogomips : 2042.00
-----------------------------
The program shown below is shorter and faster than the snipped
program (shorter because of formatting, no error checks, and no
bother with incrementing pointers, which can get in the way of
compiler optimization). Eg, in repeated runs, 1.7760 s was the
least time from the snipped program, while <1.6 seconds was typical
for program below, whose output is like following:
Setup time = 0.418803 seconds = 0.041880 seconds/pass = 0.654 nanoseconds/item
i4[11M,49M]: 11000000 49000000
Runtime 1 = 1.591445 seconds = 0.159144 seconds/pass = 2.487 nanoseconds/item
i4[11M,49M]: 11010010 49010010
Runtime 2 = 1.555033 seconds = 0.155503 seconds/pass = 2.430 nanoseconds/item
i4[11M,49M]: 11020020 49020020
In the output, Runtime 1 is for a simple loop with increment of 1,
while Runtime 2 is for loop unrolled with increment of 2. Note,
unrolling with a factor of 4 or 8 rather than 2 gave similar
results, 5% to 25% faster than no unrolling. The "i4[...]" lines
show the 11 and 49 millionth elements of i4. Here's the program:
-----------------------------
/* jiw 15 May 2009
Re: timing of adding an array of int16's to an array of int32's
Compile via: gcc time-addloops.c -O3 -Wall -o time-addloops
Copyright 2009 James Waldby. Offered without warranty
under GPL terms as at http://www.gnu.org/licenses/gpl.html
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
#define NP 10
#define NPP 64000000
//================================================
double ttime(double base) {
struct timeval tod;
gettimeofday(&tod, NULL);
return tod.tv_sec + tod.tv_usec/1e6 - base;
}
//================================================
void ttell(double t, int32_t *i4, char *item) {
printf ("%10s = %8.6f seconds = %8.6f seconds/pass = %0.3f nanoseconds/item\n",
item, t, t/NP, 1e9*(t/NP)/NPP);
printf ("i4[11M,49M]: %5d %5d\n", i4[11000000], i4[49000000]);
}
//================================================
int main() {
double t0=ttime(0);
int i, j;
int16_t *i2 = malloc(NPP*sizeof(int16_t));
int32_t *i4 = malloc(NPP*sizeof(int32_t));
for (j=0; j<NPP; ++j) {
i4[j] = j;
i2[j] = 1001;
}
ttell (ttime(t0), i4, "Setup time");
t0=ttime(0);
for (i=0; i<NP; ++i)
for (j=0; j<NPP; ++j)
i4[j] += i2[j];
ttell (ttime(t0), i4, "Runtime 1");
t0=ttime(0);
for (i=0; i<NP; ++i)
for (j=0; j<NPP; j+=2) {
i4[j+0] += i2[j+0]; i4[j+1] += i2[j+1]; /*
i4[j+2] += i2[j+2]; i4[j+3] += i2[j+3];
i4[j+4] += i2[j+4]; i4[j+5] += i2[j+5];
i4[j+6] += i2[j+6]; i4[j+7] += i2[j+7]; */
}
ttell (ttime(t0), i4, "Runtime 2");
return 0;
}
-----------------------------
--
jiw
.
- Follow-Ups:
- Re: PowerBasic rocks!
- From: Jan Panteltje
- Re: PowerBasic rocks!
- References:
- PowerBasic rocks!
- From: John Larkin
- Re: PowerBasic rocks!
- From: panteltje
- PowerBasic rocks!
- Prev by Date: Re: PowerBasic rocks!
- Next by Date: Re: PowerBasic rocks!
- Previous by thread: Re: PowerBasic rocks!
- Next by thread: Re: PowerBasic rocks!
- Index(es):
Relevant Pages
|