Re: 'Magic' kernel for image zoom resampling?
- From: "John Costella" <jpcostella@xxxxxxxxxxx>
- Date: 7 Jun 2006 05:14:41 -0700
Sorry, it was recommended that I cross-post here, but that means that
replies in the original group do not appear here.
Here is some clarifying information on the algorithm:
----------------------
"For upsampling, only divide by 16,
because every upsampled pixel gets a contribution from four downsampled
pixels' kernels)."
Huh? Both kernels should be normalized. Also, for enlargement, I expect
that you are interlacing this kernel with identity so all of the original
pixels are retained.
Both are normalized, but I was probably too obscure in my description.
And no, I am not interlacing this kernel with identity for enlargement.
I'd better explain it more clearly. Start with downsampling. Label the
pixels of the original image (x,y). For a box filter, the top-left
downsampled pixel averages (0,0), (1,0), (0,1) and (1,1), and the
result is notionally centered at (0.5,0.5). I am saying that, instead
of this, compute it as
(-1,-1) + 3(0,-1) + 3(1,-1) + (2,-1)
+ 3(-1, 0) + 9(0, 0) + 9(1, 0) + 3(2, 0)
+ 3(-1, 1) + 9(0, 1) + 9(1, 1) + 3(2, 1)
+ (-1, 2) + 3(0, 2) + 3(1, 2) + (2, 2)
and divide the result by 64. OK, ignore the fact that we have gone one
pixel outside the image; that's an edge effect that we can deal with.
You can see the original 2 x 2 'tile' is in the middle of this kernel.
Now, for the next downsampled value to the right, shift everything
right by two pixels. The box filter would just average out (2,0),
(3,0), (2,1) and (3,1), and the result is notionally centered at
(2.5,0.5). I'm saying to compute
(1,-1) + 3(2,-1) + 3(3,-1) + (4,-1)
+ 3(1, 0) + 9(2, 0) + 9(3, 0) + 3(4, 0)
+ 3(1, 1) + 9(2, 1) + 9(3, 1) + 3(4, 1)
+ (1, 2) + 3(2, 2) + 3(3, 2) + (4, 2)
and again divide the result by 64. Again you can see the original 2 x 2
tile in the middle of the calculation. And so on.
For upsampling, it's simplest to imagine that we're reversing the above
process. Label the downsampled pixels [x,y], where x and y are the
'notional' centered positions listed above, i.e., we calculated
[0.5,0.5] and [2.5][0.5] above. For each such downsampled pixel, place
a copy of the kernel, multplied by the weight of the pixel, and divide
by 16. If you work it through, you'll see that each upsampled pixel
receives a contribution of 9 times the nearest downsampled pixel (the
one whose corner it's 'in'), 3 times the next two nearest ones (other
adjacent corners of the 2 x 2 tile), and 1 times the opposite corner.
That's a total of 16, so you divide by 16. For example,
(0,0) = { 9[0.5,0.5] + 3[0.5,-1.5] + 3[-1.5,0.5] + [-1.5,-1.5] } / 16
Again, we have edge effects in that we've gone outside the image, but
we can deal with that.
Your 1D separated kernel, 0.125*[1 3 3 1], seems like an approximate
Gaussian and this has been covered in literature, but maybe not for your
particular stand. dev..
Well, yeah, [1 3 3 1] is an approximation to just about anything if you
try hard enough. :)
You can make the two terms (0.5,3) and (1.5,1) fit a Gaussian by making
its standard deviation 1 / sqrt( ln( 3 ) ) = 0.954.... The next two
terms would then be 1/3 and 1/81, so what you're telling me is that
it's a 'truncated Gaussian'? I suppose so, but that doesn't really tell
me why it works so well in two dimensions. (More on this below.)
I looked at the site you recommended in your next reply. I know all
about the information theoretical aspects of 1-dimensional resampling,
and have played with those things in many different forms for many
years (and continue to, in other areas). The problems come with a
two-dimensional grid, as far as I can tell.
None of the Gaussians listed on that site have coefficients that match
the 'magic' kernel. That's not to say that this isn't a fruitful way to
look at it, but at this stage I'm not quite sure.
For interest, I've uploaded a simple graph of the 800% kernel to
http://www.assassinationscience.com/johncostella/kernel800percent.jpg
Looking at the numbers, the kernel is still the product of two 1-d 'x'
and 'y' kernels. The 1-d kernel is actually three parabolas:
[0] 1 3 6 10 15 21 28 [36] 42 46 48 48 46 42 [36] 28 21 15 10 6 3 1 [0]
where the joins between the parabolas are marked by the [36] terms.
That it is three parabolas can be proved from second differences:
[0] 1 2 3 4 5 6 7 [8] 6 4 2 0 -2 -4 -6 [-8] -7 -6 -5 -4 -3 -2 -1 [0]
1 1 1 1 1 1 1 1 | -2 -2 -2 -2 -2 -2 -2 -2 | 1 1 1 1 1 1 1 1
The 400% kernel follows the same pattern: it is the product of
[0] 1 3 6 [10] 12 12 [10] 6 3 1 [0]
the differences of which give
[0] 1 2 3 [4] 2 0 -2 [-4] -3 -2 -1 [0]
1 1 1 1 | -2 -2 -2 -2 | 1 1 1 1
I guess then the 200% one can be written
[0] 1 [3] [3] 1 [0]
which on differencing gives
[0] 1 [2] 0 [-2] -1 [0]
1 1 | -2 -2 | 1 1
And I suppose that's the algorithm for generating the 2^k kernel:
A. Form a list of k 1's followed by k -2's followed by k 1's.
B. Starting at 0, form the cumulative sum of the first list.
C. Starting at 0, form the cumulative sum of the second list.
D. Use the 3rd list as the 1-d kernel.
E. Form the 2-d kernel as the product of the two 1-d 'x' and 'y'
kernels.
So for 1600% magnification, the steps would be:
A. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
-2 -2 -2 -2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
B. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 14 12 10 8 6 4 2 0 -2 -4 -6
-8 -10 -12 -14 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
C. 1 3 6 10 15 21 28 36 45 55 66 78 91 105 120 136 150 162 172 180 186
190 192 192 190 186 180 172 162 150 136 120 105 91 78 66 55 45 36 28 21
15 10 6 3 1
and so on.
This makes for a pretty efficient implementation. The kernel dimensions
are 3(2^k)-2 by 3(2^k)-2 (i.e. 4 by 4 for 2^1, 10 by 10 for 2^2, 22 by
22 for 2^3, 46 by 46 for 2^4, etc.). The 1-d kernel has 3(2^(k-1))-1
unique elements (i.e. 2 for 2^1, 5 for 2^2, 11 for 2^3, 23 for 2^4),
one of which is always 1, so you only need 3(2^(k-1))-2 integer
multiplication lookup tables (e.g. for 2^k=16, you only need 22 lookup
tables), because at any position multiplying by the kernel coefficient
is the same as multiplying by the 'x' 1-d kernel element and then the
'y' 1-d kernel element. The above algorithm gives you them all. The
normalization is 2^(6k).
Regarding size reductions, the most commonly used COMMERCIAL method is to
use a discrete kernel derived from an expanded version of the same
continuous kernel, e.g., bicubic, used for enlargement. If the size
factor is 1/N, then the continuous kernel is expanded by N. Of course,
the compact support of the discrete reduction kernel is expanded. Your
use of the same kernels for both operations violates this.
Yeah, I know I'm not doing the normal thing, but it seems to work. :)
John
.
- Follow-Ups:
- Re: Re: 'Magic' kernel for image zoom resampling?
- From: aruzinsky
- Re: Re: 'Magic' kernel for image zoom resampling?
- References:
- 'Magic' kernel for image zoom resampling?
- From: jpcostella
- 'Magic' kernel for image zoom resampling?
- Prev by Date: 'Magic' kernel for image zoom resampling?
- Next by Date: Re: Re: 'Magic' kernel for image zoom resampling?
- Previous by thread: 'Magic' kernel for image zoom resampling?
- Next by thread: Re: Re: 'Magic' kernel for image zoom resampling?
- Index(es):
Relevant Pages
|