Re: how to achieve fast image 90 deg rotation in C

From: Severian (severian_at_chlamydia-is-not-a-flower.com)
Date: 06/16/04


Date: Wed, 16 Jun 2004 13:25:42 -0400

On Wed, 16 Jun 2004 09:27:20 -0700, "Richard Zhu" <yzhu@algis.ca>
wrote:
>"Severian" <severian@chlamydia-is-not-a-flower.com> wrote in message
>news:6t2sc0hm8m2a9smih74dqaeb9l8shv94dr@4ax.com...
>> On Mon, 14 Jun 2004 11:11:31 -0700, "Richard Zhu" <yzhu@algis.ca>
>> wrote:
>>
>> >I have the following code to implement a 90 deg rot over an Image in VC,
>> >the speed can't reach what is required,
>> >what can I do to speed up, for width =1024, height =768 color image
>grabbed
>> >from the camera, there is no hardware rotation inside, so I need a real
>> >time rotation, on my 2.4G intel I need less than 10ms , but the following
>> >code gives about 60ms, I wonder if there still the possibilies to get
>there
>> >?
>> >
>> >I have checked, the slowing caused by locating chR1[y0*o_w+x0], ....
>> >
>> >
>> >int x1,y1,x0,y0,i;
>> > int n_w,n_h,o_w,o_h;
>> > int offset;
>> > n_w=img_height;
>> > n_h=img_width;
>> > o_w=img_width;
>> > o_h=img_height;
>> >
>> > for(y1=0;y1<n_h;y1++)
>> > {
>> > offset=y1*n_w;
>> > for(x1=0;x1<n_w;x1++)
>> > {
>> > x0=y1;
>> > y0=o_h-1-x1;
>> > chR_rot[offset+x1]=chR1[y0*o_w+x0];
>> > chG_rot[offset+x1]=chG1[y0*o_w+x0];
>> > chB_rot[offset+x1]=chB1[y0*o_w+x0];
>> > }
>> > }
>> > *new_width=n_w;
>> > *new_height=n_h;
>>
>> I am not a cache expert, but it may be quicker to read rows and write
>> columns. You're reading columns, so you will have a lot more cache
>> misses.
>>
>> My program does a 1024x768 24-bit rotation in around 60-70ms on a P3
>> 800, so that should put you in the ballpark on your faster machine.
>>
>> If you have other image-oriented stuff to do, it might be worthwhile
>> to investigate the Intel performance libraries, which will do things
>> like this with highly-optimized MMX, SSE and SSE2 code.
>>
>> IPL requires 10-20ms to rotate a 1024x768 image in on my P3 800.
>>
>> --
>> Sev
>
>I have tested, that's great !, it's truely improved as you said,
>
> for(y0=0;y0<o_h;y0++)
> {
> offset=y0*o_w;
> for(x0=0;x0<o_w;x0++)
> {
> x1=o_h-1-y0;
> y1=x0;
> chR_side[y1*n_w+x1]=chR1[offset+x0];
> chG_side[y1*n_w+x1]=chG1[offset+x0];
> chB_side[y1*n_w+x1]=chB1[offset+x0];
> }
> }
>
>use offset in read,
>now the time reduced to 40ms from from60-70ms.
>
>I think I still need to try IPL.

You're welcom! Depending on your compiler and cache, you may get more
improvement if you write separate loops for each channel. Also, be
sure you're compiling with optimization.

--
Sev


Relevant Pages

  • Re: how to achieve fast image 90 deg rotation in C
    ... You're reading columns, so you will have a lot more cache ... may fit entirely into L1 cache if you try using slabs that are 32 rows ... You probably want as little code contained inside the inner loop as ...
    (sci.image.processing)
  • Re: Code density and performance?
    ... > Icache is more like doubling the cache size reduces the miss rate ... * Compiler decides which function should be optimized for speed, ... * Compiler use profile results to layout functions. ... more optimizations benefit from profiling data). ...
    (comp.arch)
  • Re: gfortran, g95, and dual-core
    ... That's why cache is ... Sometimes using large amounts of memory can help lower disk ... I realized that disk drives matter for big problems that use virtual ... but I did not think about the compiler. ...
    (comp.lang.fortran)
  • Re: Intel to chip away at Itanium prices <- or ... I want my cheap
    ... > several data points with its different cache sizes (and they clearly ... Factoring for compiler rev ... scores from IBM for x86 on www.spec.org - at least none for processors ... used by those vendors you mentioned above, ...
    (comp.os.vms)
  • Re: How much tuning does regular lisp compilers do?
    ... | question of how realistic such improved "cached aligned" loops ... In my conversations with people who *are* experienced compiler ... expect that on modern x86 machines that the penalty ... And since cache lines are aligned ...
    (comp.lang.lisp)