June 2013

In this post I will explore a bit the question on how to calculate the discrete gradient and the discrete divergence of an image and a vector field, respectively.

Let {u_0\in{\mathbb R}^{N\times M}} be a discrete image, i.e. {u_0(i,j)} denotes the gray value at the {i,j}-th pixel. The famous total variation denoising amounts to minimizing the functional

\displaystyle \Phi(u) = \tfrac12 \int (u-u_0)^2 + \lambda\int|\nabla u|

where the integral shall be understood as summation, {\nabla} stand for the gradient, i.e. the vector of the partial derivatives, and {|\cdot|} stands for the euclidean absolute value.

When using primal-dual methods, it is of crucial importance, that the used operators for the gradient and the divergence are numerically adjoint (up to the minus sign). That is, the numerical operations are adjoint in the following sense: If grad is the operation for the gradient and div is the operation for the divergence, then it should hold for any variables {u} and {v} of suitable size and with gradu = grad(u), divv = div(v) that sum(gradu(:).*v(:)) and -sum(u(:).*divv(:)) are equal up to numerical precision. Due to the boundary treatment, the internal MATLAB operations gradient and divergence do not fulfill this requirement.

The most common discretization of the gradient uses discrete forward differences and a constant padding at the boundary (which means that Neumann boundary values are applied). In formula, this reads as

\displaystyle (D_xu)_{i,j} = \begin{cases} u_{i+1,j} - u_{i,j} & i<N\\ 0 & i=N \end{cases} \qquad (D_yu)_{i,j} = \begin{cases} u_{i,j+1} - u_{i,j} & j<M\\ 0 & j=M \end{cases}

The respective adjoints are backward differences with zero boundary treatment (check it!). Apparently, there are many different ways to implement this routines (and the respective adjoints) in MATLAB. Here are four of them:

  1. For loops: Run through all pixels in two for-loops and assign the difference to the output. Of course, one should preallocate the output prior to the loop. But you may probably know the first two rules of MATLAB coding? If not, here they are: 1. Avoid for-loops. 2. Avoid for-loops, seriously. I put the routines into extra functions and created anonymous function to call the gradient and the divergence as
    grad = @(u) cat(3,dxp(u),dyp(u));
    div = @(V) dxm(V(:,:,1)) + dym(V(:,:,2));
  2. Shift and subtract: MATLAB is great in using vectors and matrices. And to avoid the for-loop one could also implement the forward difference in {x}-direction by shifting the matrix and subtract the original one, i.e. [u(:,2:end) u(:,end)] - u (and similarly for the other differences). Again, I wrote extra functions and used anonymous function as above.
  3. Small sparse matrices from the left and from the right: MATLAB is also pretty good with sparse matrices. Since the derivatives in {x}-direction only involve the subtraction of two elements in the same column, one can realize this by multiplying an image from the left with a sparse diagonal matrix with just two non-zero diagonals. Similarly, the derivative in {y}-direction can be realized by multiplying from the right with a suitable matrix. More precisely, this approach is realized by
    Dy = spdiags([-ones(M,1) ones(M,1)],[0 1],M,M);
    Dy(M,:) = 0;
    Dx = spdiags([-ones(N,1) ones(N,1)],[0 1],N,N);
    Dy(N,:) = 0;
    Dxu = Dx*u;
    Dyu = u*Dy';

    (check it). Note that the adjoint of {x}-derivative if simple the operation Dx'*u and the operation of the {y}-derivative is u*Dy. Together, the calculation of the gradient and the divergence was done by the anonymous functions

    grad = @(u) cat(3,u*Dx',Dy*u);
    div = @(V) V(:,:,1)*Dx + Dy'*V(:,:,2);
  4. Large sparse matrices: One could think about the following: Vectorize the image by U = u(:) (which amounts to stacking the columns above each other). Then assemble a large {NM\times NM} sparse matrix which has just two non-zero diagonals to do the forward (and other) differences. More precisely this can be done by
    (with Dx and Dy from above)

    DX = kron(Dx,speye(M));
    DY = kron(speye(N),Dy);
    DxU = DX*U;
    DyU = DY*U;

    Here, it is clear that the respective adjoint are just the multiplication with the transposed matrices. Here, the anonymous functions are

    grad = @(u) [DX*u DY*u];
    div = @(V) DX'*V(:,1) + DY'*V(:,2);

The different approaches have different pros and cons. Well, the for-loop only has cons: It is presumably slow and uses a lot indexing which easily leads to bugs. The shift-and-subtract method should go into an extra function to make it easy to use – but this is not necessarily a drawback. For the multiplication with the large matrix, one has to vectorize the image first and every time one wants to look at the result and need to do reshape(U,N,M). But let’s focus on speed: I implemented all methods, and let the run on square images of different sizes for 50 times (after a warmup) and measures the execution times with tic and toc. The assembly of the matrices did not enter the timing. Moreover, no parallel computation was used – just a single core. Finally, memory was not an issue since the larges matrices (of size {2500\times 2500\times 2}) only need roughly 100MB of memory.

Here is the table with the average times for one calculation of the gradient (in seconds):

{N} For-loop Shift-subtract left-right left
100 0.0004 0.0002 0.0001 0.0003
200 0.0018 0.0010 0.0005 0.0015
300 0.0057 0.0011 0.0014 0.0020
400 0.0096 0.0031 0.0022 0.0035
500 0.0178 0.0035 0.0030 0.0054
750 0.0449 0.0114 0.0097 0.0123
1000 0.0737 0.0189 0.0128 0.0212
1500 0.2055 0.0576 0.0379 0.0601
2000 0.3942 0.0915 0.0671 0.1136
2500 0.6719 0.1571 0.1068 0.1788

and here is the one for the divergences:

{N} For-loop Shift-subtract left-right left
100 0.0004 0.0003 0.0002 0.0002
200 0.0018 0.0015 0.0005 0.0008
300 0.0048 0.0016 0.0015 0.0012
400 0.0090 0.0036 0.0020 0.0022
500 0.0158 0.0057 0.0027 0.0035
750 0.0409 0.0132 0.0073 0.0069
1000 0.0708 0.0238 0.0130 0.0125
1500 0.2008 0.0654 0.0344 0.0370
2000 0.3886 0.1285 0.0622 0.0671
2500 0.6627 0.2512 0.1084 0.1361

As expected, the for-loop is clearly slower and also as expected, all methods basically scale quadratically (doubling {N} amounts to multiplying the running time by four) since the work per pixel is constant. A little surprisingly, the multiplication from the left and from the right is fastest and also consistently a little faster then multiplication from the left with the larger sparse matrix. I don’t know, why the results are that different for the gradient and for the divergence. Maybe this is related to my use on anonymous functions or the allocation of memory?

The second day of SSVM started with an invited lecture of Tony Lindeberg, who has written one very influential and very early book about scale space theory. His talk was both a tour through scale space and a recap of the recent developments in the field. Especially he show how the time aspect could be incorporated into scale space analysis by a close inspection of how receptive fiels are working. There were more talks but I only took notes from the talk of Jan Lellmann who talked about the problem of generating an elevation map from a few given level lines. One application of this could be to observe coastlines at different tides and then trying the reconstruct the full height map at the coast. One specific feature here is that the surface one looks for may have ridges which stem from kinks in the level lines and these ridges are important features of the surface. He argued that a pure convex regularization will not work and proposed to use more input namely a vector field which is derived from the contour lines such that the vector somehow “follows the ridges”, i.e. it connects to level lines in a correct way.

Finally another observation I had today: Well, this is not a trend, but a notion which I heard for the first time here but which sounds very natural is the informal classification of data terms in variational models as “weak” or “strong”. For example, a denoising data term {\|u-u^0\|^2_{L^2(\Omega)}} is a strong data term because it gives tight information on the whole set {\Omega}. On the other hand, an inpainting data term {u|_{\Omega\setminus\Omega'} = u^0|_{\Omega\setminus\Omega'}} is a weak data term because it basically tell nothing within the region {\Omega'}.

For afternoon the whole conference has been on tour to three amazing places:

  • the Riegersburg, which is not only an impressive castle but also features interesting exhibitions about old arm and witches,
  • the Zotter chocolate factory where they make amazing chocolate in mind-boggling varieties,
  • and to Schloss Kronberg for the conference dinner (although it was pretty tough to start eating the dinner after visiting Zotter…).

I am at the SSVM 13 and the first day is over. The conference is in a very cozy place in Austria, namely the Schloss Seggau which is located on a small hill near the small town Leibnitz in Styria (which is in Austria but totally unaffected by the floods). The conference is also not very large but there is a good crowd of interesting people here and the program reads interesting.

Schloss Seggau

The time schedule is tight and I don’t know if I can make it to post something everyday. Especially, I will not be able to blog about every talk.

From the first day I have the impression that two things have appeared frequently at different places:

  • Differential geometry: Tools from differential geometry like the notion of geodesics or Riemannian metrics have not only been used by Martin Rumpf to model different types of spaces of shapes but also to interpolate between textures.
  • Primal dual methods: Variational models should be convex these days and if they are not you should make them convex somehow. Then you can use primal dual methods. Somehow the magic of primal dual methods is that they are incredibly flexible. Also they work robustly and reasonably fast.

Probably, there are more trends to see in the next days.

This time I have something which is very specific for the German academic system and it seems to me that writing in German is more appropriate here.

Die Berufung von Professoren ist ein aufwändiges und langwieriges Unterfangen. Es gibt viele Regeln zu beachten und viele Meinungen und Daten fließen ein. Insgesamt soll “der/die beste Kandidat/in” die Stelle bekommen. Die Entscheidung basiert am Ende auf dem Votum der Kommission die sich ihre Meinung auf Grund der Eckdaten der Bewerbung und des Lebenslaufes, des persönlichen Eindrucks bei der Vorstellung und auf dem Votum externer Gutachter bildet. Das alles kostet viel Zeit und Arbeit. Außerdem kann das ganze Verfahren schnell unter den Verdacht kommen, dass persönliche Beziehungen eine zu große Rolle spielen könnten und dass am Ende nicht immer “der/die Beste” genommen wird (obwohl es umfangreiche Regeln zum Ausschluss von Befangenheit gibt). Man stelle sich vor, es gäbe eine zentrale Stelle, die mit viel Aufwand und geschultem Personal Daten über Nachwuchswissenschaftler sammelt:

  • Wer hat was publiziert? Wer wird wie oft zitiert?
  • Wer trägt auf welchen Konferenzen und Workshops vor? Und wer wird von wem eingeladen?
  • Was denken die führenden Köpfe, wer die großen Nachwuchstalente sind? Was denken die Nachwuchskräfte über sich gegenseitig?
  • Wer hat wieviel Drittmittel eingeworben? Wer hat welche Kooperationen initiiert und welche Patente angemeldet?

All das lässt sich durch eine zentrale und unabhängige Stelle viel effizienter und auch viel objektiver erheben. Eine Berufungskommission könnte sich dann einfach und schnell eine fundierte Meinung bilden. Eine einfache Datenbankanfrage an die zentrale Stelle würde sofort eine Auskunft liefern, wer von den Kandidaten am besten da steht und die Auswahl aus den oft über fünfzig Bewerbungen wäre kaum noch Arbeit…

Würde das Verfahren funktionieren? Würde irgendjemand das Verfahren akzeptieren? Die Antwort auf diese rhetorischen Fragen fällt eindeutig aus: Eine solche zentrale Stelle wäre völliger Schwachsinn und die Informationen, die sie liefern könnte wären nutzlos. Einerseits sind die Kriterien sicherlich nicht gut gewählt: Publikationen in der “wissenschaftlichen Vanity-Press” zählen gleich viel wie ein Artikel in den Annals? Was ist mit interdisziplinären Arbeiten, die in leicht “fachfremnden” Zeitschriften erscheinen? Einladungen auf Konferenzen lassen sich durch Gegeneinladung erschleichen. Eine gute Reputation bekomme ich nur, wenn ich mich Gut-Freund mit den wichtigen Leuten mache – gute “Netzwerker” haben da natürlich einfacher, als “stille Wasser”. Man könnte natürlich versuchen, bei den Kriterien nachzubessern: Wir beziehen die Reputation einer Zeitschrift mit ein! Wir checken “Gegeneinladungen” und rechnen sie wieder heraus! Wir beziehen auch Zeitschriften in “angrenzenden Feldern” mit ein! Man merkt schnell, dass auch das Nachbessern zum Scheitern verurteilt ist: Wer legt denn die Reputation einer Zeitschrift fest? Gegeneinladungen sind nicht immer ein Zeichen von Gemauschel, sondern oft ganz natürlich und Ausdruck einer Kooperation. Welche Felder grenzen denn an und wo wird dann die Grenze gezogen?

Außerdem ist eine solche zentrale Stelle auch fundamental in Frage zu stellen: Ist es überhaupt machbar, Kandidaten neutral und objektiv zu Bewerten? Und was soll “objektiv” bedeuten, wenn doch jeder Lebenslauf verschieden und die Anforderungen an jede Stelle und an jeder Uni verschieden sind? Eine anerkannte, allgemeingültige Methode, um Kandidaten gegeneinander abzuwägen gibt es nicht. Eine standardisierte, zahlenbasierte Bewertung von Personen ist absurd. Eine derart komplexe Thematik lässt sich nicht durch ein automatisches Verfahren und einfach in Zahlen abbilden. Vom ethischen Aspekt ganz zu schweigen…

Schon mal was vom CHE Uni-Ranking gehört?