In this post I will explore a bit the question on how to calculate the discrete gradient and the discrete divergence of an image and a vector field, respectively.
Let be a discrete image, i.e. denotes the gray value at the -th pixel. The famous total variation denoising amounts to minimizing the functional
where the integral shall be understood as summation, stand for the gradient, i.e. the vector of the partial derivatives, and stands for the euclidean absolute value.
When using primal-dual methods, it is of crucial importance, that the used operators for the gradient and the divergence are numerically adjoint (up to the minus sign). That is, the numerical operations are adjoint in the following sense: If
grad is the operation for the gradient and
div is the operation for the divergence, then it should hold for any variables and of suitable size and with
gradu = grad(u),
divv = div(v) that
-sum(u(:).*divv(:)) are equal up to numerical precision. Due to the boundary treatment, the internal MATLAB operations
divergence do not fulfill this requirement.
The most common discretization of the gradient uses discrete forward differences and a constant padding at the boundary (which means that Neumann boundary values are applied). In formula, this reads as
The respective adjoints are backward differences with zero boundary treatment (check it!). Apparently, there are many different ways to implement this routines (and the respective adjoints) in MATLAB. Here are four of them:
For loops:Run through all pixels in two for-loops and assign the difference to the output. Of course, one should preallocate the output prior to the loop. But you may probably know the first two rules of MATLAB coding? If not, here they are: 1. Avoid for-loops. 2. Avoid for-loops, seriously. I put the routines into extra functions and created anonymous function to call the gradient and the divergence as
grad = @(u) cat(3,dxp(u),dyp(u)); div = @(V) dxm(V(:,:,1)) + dym(V(:,:,2));
Shift and subtract:MATLAB is great in using vectors and matrices. And to avoid the for-loop one could also implement the forward difference in -direction by shifting the matrix and subtract the original one, i.e.
[u(:,2:end) u(:,end)] - u(and similarly for the other differences). Again, I wrote extra functions and used anonymous function as above.
Small sparse matrices from the left and from the right:MATLAB is also pretty good with sparse matrices. Since the derivatives in -direction only involve the subtraction of two elements in the same column, one can realize this by multiplying an image from the left with a sparse diagonal matrix with just two non-zero diagonals. Similarly, the derivative in -direction can be realized by multiplying from the right with a suitable matrix. More precisely, this approach is realized by
Dy = spdiags([-ones(M,1) ones(M,1)],[0 1],M,M); Dy(M,:) = 0; Dx = spdiags([-ones(N,1) ones(N,1)],[0 1],N,N); Dy(N,:) = 0; Dxu = Dx*u; Dyu = u*Dy';
(check it). Note that the adjoint of -derivative if simple the operation
Dx'*uand the operation of the -derivative is
u*Dy. Together, the calculation of the gradient and the divergence was done by the anonymous functions
grad = @(u) cat(3,u*Dx',Dy*u); div = @(V) V(:,:,1)*Dx + Dy'*V(:,:,2);
Large sparse matrices:One could think about the following: Vectorize the image by
U = u(:)(which amounts to stacking the columns above each other). Then assemble a large sparse matrix which has just two non-zero diagonals to do the forward (and other) differences. More precisely this can be done by
DX = kron(Dx,speye(M)); DY = kron(speye(N),Dy); DxU = DX*U; DyU = DY*U;
Here, it is clear that the respective adjoint are just the multiplication with the transposed matrices. Here, the anonymous functions are
grad = @(u) [DX*u DY*u]; div = @(V) DX'*V(:,1) + DY'*V(:,2);
The different approaches have different pros and cons. Well, the for-loop only has cons: It is presumably slow and uses a lot indexing which easily leads to bugs. The shift-and-subtract method should go into an extra function to make it easy to use – but this is not necessarily a drawback. For the multiplication with the large matrix, one has to vectorize the image first and every time one wants to look at the result and need to do
reshape(U,N,M). But let’s focus on speed: I implemented all methods, and let the run on square images of different sizes for 50 times (after a warmup) and measures the execution times with
toc. The assembly of the matrices did not enter the timing. Moreover, no parallel computation was used – just a single core. Finally, memory was not an issue since the larges matrices (of size ) only need roughly 100MB of memory.
Here is the table with the average times for one calculation of the gradient (in seconds):
and here is the one for the divergences:
As expected, the for-loop is clearly slower and also as expected, all methods basically scale quadratically (doubling amounts to multiplying the running time by four) since the work per pixel is constant. A little surprisingly, the multiplication from the left and from the right is fastest and also consistently a little faster then multiplication from the left with the larger sparse matrix. I don’t know, why the results are that different for the gradient and for the divergence. Maybe this is related to my use on anonymous functions or the allocation of memory?