Re: how to achieve fast image 90 deg rotation in C
From: Martin Brown (|||newspam|||_at_nezumi.demon.co.uk)
Date: 06/15/04
- Next message: Costantino Grana: "Re: Grayscale image range of pixel values?"
- Previous message: Bernie: "Re: mpeg format"
- In reply to: Severian: "Re: how to achieve fast image 90 deg rotation in C"
- Next in thread: Richard Zhu: "Re: how to achieve fast image 90 deg rotation in C"
- Messages sorted by: [ date ] [ thread ]
Date: Tue, 15 Jun 2004 11:32:42 +0100
In message <6t2sc0hm8m2a9smih74dqaeb9l8shv94dr@4ax.com>, Severian
<severian@chlamydia-is-not-a-flower.com> writes
>On Mon, 14 Jun 2004 11:11:31 -0700, "Richard Zhu" <yzhu@algis.ca>
>wrote:
>
>>I have the following code to implement a 90 deg rot over an Image in VC,
>>the speed can't reach what is required,
>> for(y1=0;y1<n_h;y1++)
>> {
>> offset=y1*n_w;
>> for(x1=0;x1<n_w;x1++)
>> {
>> x0=y1;
>> y0=o_h-1-x1;
>> chR_rot[offset+x1]=chR1[y0*o_w+x0];
>> chG_rot[offset+x1]=chG1[y0*o_w+x0];
>> chB_rot[offset+x1]=chB1[y0*o_w+x0];
>> }
>> }
>
>I am not a cache expert, but it may be quicker to read rows and write
>columns. You're reading columns, so you will have a lot more cache
>misses.
It is worth trying both ways. They may well be different.
But it probably requires something a bit more devious than that.
1024x768 RGB transposed naively will ensure there are huge numbers of
cache misses. And unless the L2 cache is big enough you will end up with
everything being choked by making frequent direct access to main memory.
Assuming the CPU has 2 x 8k and 32 byte per line L1 cache. The transpose
may fit entirely into L1 cache if you try using slabs that are 32 rows
deep and 256 wide (or maybe 128 wide). If you can monitor cache misses
then it should be possible to tune it precisely. Some CPUs and OS's will
tell you about their cache configuration details on request.
You probably want as little code contained inside the inner loop as
possible, modern predictive branching makes loop unrolling unlikely to
help. It also matters whether or not the RGB buffers are interleaved.
If they aren't interleaved then using entirely separate loops for each
of R,G,B arrays might fit the whole image into L2 cache if you are
lucky.
>
>My program does a 1024x768 24-bit rotation in around 60-70ms on a P3
>800, so that should put you in the ballpark on your faster machine.
>
>If you have other image-oriented stuff to do, it might be worthwhile
>to investigate the Intel performance libraries, which will do things
>like this with highly-optimized MMX, SSE and SSE2 code.
>
>IPL requires 10-20ms to rotate a 1024x768 image in on my P3 800.
There is your target to beat. Or you could use IPL instead.
Regards,
-- Martin Brown
- Next message: Costantino Grana: "Re: Grayscale image range of pixel values?"
- Previous message: Bernie: "Re: mpeg format"
- In reply to: Severian: "Re: how to achieve fast image 90 deg rotation in C"
- Next in thread: Richard Zhu: "Re: how to achieve fast image 90 deg rotation in C"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|