In this post I will explore a bit the question on how to calculate the discrete gradient and the discrete divergence of an image and a vector field, respectively.

Let {u_0\in{\mathbb R}^{N\times M}} be a discrete image, i.e. {u_0(i,j)} denotes the gray value at the {i,j}-th pixel. The famous total variation denoising amounts to minimizing the functional

\displaystyle \Phi(u) = \tfrac12 \int (u-u_0)^2 + \lambda\int|\nabla u|

where the integral shall be understood as summation, {\nabla} stand for the gradient, i.e. the vector of the partial derivatives, and {|\cdot|} stands for the euclidean absolute value.

When using primal-dual methods, it is of crucial importance, that the used operators for the gradient and the divergence are numerically adjoint (up to the minus sign). That is, the numerical operations are adjoint in the following sense: If grad is the operation for the gradient and div is the operation for the divergence, then it should hold for any variables {u} and {v} of suitable size and with gradu = grad(u), divv = div(v) that sum(gradu(:).*v(:)) and -sum(u(:).*divv(:)) are equal up to numerical precision. Due to the boundary treatment, the internal MATLAB operations gradient and divergence do not fulfill this requirement.

The most common discretization of the gradient uses discrete forward differences and a constant padding at the boundary (which means that Neumann boundary values are applied). In formula, this reads as

\displaystyle (D_xu)_{i,j} = \begin{cases} u_{i+1,j} - u_{i,j} & i<N\\ 0 & i=N \end{cases} \qquad (D_yu)_{i,j} = \begin{cases} u_{i,j+1} - u_{i,j} & j<M\\ 0 & j=M \end{cases}

The respective adjoints are backward differences with zero boundary treatment (check it!). Apparently, there are many different ways to implement this routines (and the respective adjoints) in MATLAB. Here are four of them:

  1. For loops: Run through all pixels in two for-loops and assign the difference to the output. Of course, one should preallocate the output prior to the loop. But you may probably know the first two rules of MATLAB coding? If not, here they are: 1. Avoid for-loops. 2. Avoid for-loops, seriously. I put the routines into extra functions and created anonymous function to call the gradient and the divergence as
    grad = @(u) cat(3,dxp(u),dyp(u));
    div = @(V) dxm(V(:,:,1)) + dym(V(:,:,2));
    
  2. Shift and subtract: MATLAB is great in using vectors and matrices. And to avoid the for-loop one could also implement the forward difference in {x}-direction by shifting the matrix and subtract the original one, i.e. [u(:,2:end) u(:,end)] - u (and similarly for the other differences). Again, I wrote extra functions and used anonymous function as above.
  3. Small sparse matrices from the left and from the right: MATLAB is also pretty good with sparse matrices. Since the derivatives in {x}-direction only involve the subtraction of two elements in the same column, one can realize this by multiplying an image from the left with a sparse diagonal matrix with just two non-zero diagonals. Similarly, the derivative in {y}-direction can be realized by multiplying from the right with a suitable matrix. More precisely, this approach is realized by
    Dy = spdiags([-ones(M,1) ones(M,1)],[0 1],M,M);
    Dy(M,:) = 0;
    Dx = spdiags([-ones(N,1) ones(N,1)],[0 1],N,N);
    Dy(N,:) = 0;
    Dxu = Dx*u;
    Dyu = u*Dy';
    

    (check it). Note that the adjoint of {x}-derivative if simple the operation Dx'*u and the operation of the {y}-derivative is u*Dy. Together, the calculation of the gradient and the divergence was done by the anonymous functions

    grad = @(u) cat(3,u*Dx',Dy*u);
    div = @(V) V(:,:,1)*Dx + Dy'*V(:,:,2);
    
  4. Large sparse matrices: One could think about the following: Vectorize the image by U = u(:) (which amounts to stacking the columns above each other). Then assemble a large {NM\times NM} sparse matrix which has just two non-zero diagonals to do the forward (and other) differences. More precisely this can be done by
    (with Dx and Dy from above)

    DX = kron(Dx,speye(M));
    DY = kron(speye(N),Dy);
    DxU = DX*U;
    DyU = DY*U;
    

    Here, it is clear that the respective adjoint are just the multiplication with the transposed matrices. Here, the anonymous functions are

    grad = @(u) [DX*u DY*u];
    div = @(V) DX'*V(:,1) + DY'*V(:,2);
    

The different approaches have different pros and cons. Well, the for-loop only has cons: It is presumably slow and uses a lot indexing which easily leads to bugs. The shift-and-subtract method should go into an extra function to make it easy to use – but this is not necessarily a drawback. For the multiplication with the large matrix, one has to vectorize the image first and every time one wants to look at the result and need to do reshape(U,N,M). But let’s focus on speed: I implemented all methods, and let the run on square images of different sizes for 50 times (after a warmup) and measures the execution times with tic and toc. The assembly of the matrices did not enter the timing. Moreover, no parallel computation was used – just a single core. Finally, memory was not an issue since the larges matrices (of size {2500\times 2500\times 2}) only need roughly 100MB of memory.

Here is the table with the average times for one calculation of the gradient (in seconds):

{N} For-loop Shift-subtract left-right left
100 0.0004 0.0002 0.0001 0.0003
200 0.0018 0.0010 0.0005 0.0015
300 0.0057 0.0011 0.0014 0.0020
400 0.0096 0.0031 0.0022 0.0035
500 0.0178 0.0035 0.0030 0.0054
750 0.0449 0.0114 0.0097 0.0123
1000 0.0737 0.0189 0.0128 0.0212
1500 0.2055 0.0576 0.0379 0.0601
2000 0.3942 0.0915 0.0671 0.1136
2500 0.6719 0.1571 0.1068 0.1788

and here is the one for the divergences:

{N} For-loop Shift-subtract left-right left
100 0.0004 0.0003 0.0002 0.0002
200 0.0018 0.0015 0.0005 0.0008
300 0.0048 0.0016 0.0015 0.0012
400 0.0090 0.0036 0.0020 0.0022
500 0.0158 0.0057 0.0027 0.0035
750 0.0409 0.0132 0.0073 0.0069
1000 0.0708 0.0238 0.0130 0.0125
1500 0.2008 0.0654 0.0344 0.0370
2000 0.3886 0.1285 0.0622 0.0671
2500 0.6627 0.2512 0.1084 0.1361

As expected, the for-loop is clearly slower and also as expected, all methods basically scale quadratically (doubling {N} amounts to multiplying the running time by four) since the work per pixel is constant. A little surprisingly, the multiplication from the left and from the right is fastest and also consistently a little faster then multiplication from the left with the larger sparse matrix. I don’t know, why the results are that different for the gradient and for the divergence. Maybe this is related to my use on anonymous functions or the allocation of memory?

About these ads