October 2016

Today I gave a talk at STOR colloquium at the University of North Carolina (UNC). I spoke about the Master’s thesis of Lars Mescheder in which he developed probabilistic models for image denoising. Among other things, he proposed a Gaussian scale mixture model as an image prior and developed several methods to infer information from the respective posterior. One curios result was that the Perona-Malik diffusivity (and variants thereof) popped up in basically every algorithm he derived. It’s not only that the general idea to have an edge dependent diffusion coefficient in a PDE or, formally equivalent, a concave penalty for the magnitude of the gradient in a variational formulation, but it also turned out that the very diffusion coefficient {\frac1{1+|\nabla u|^{2}}} appeared exactly in this form.

Besides the talk I got the chance to chat with many interesting people. I’d like to mention a chat about a background story on Perona-Malik which I find quite interesting:

Steven Pizer, Kenan Professor for computer science at UNC and active in a lot of fields for several decades, and in particular a driving for in the early days of digital image analysis, told that the very idea of the non-linear, gradient-magnitude dependent diffusion coefficient actually dates back to works of neuroscientists Stephen Grossberg and that he moderated a discussion between Stephan Grossberg, Pietro Perona and Jitendra Malik in which the agreed that the method should be attributed as “Grossberg-Perona-Malik” method. I could not find any more hints for this story anywhere else, but given the mathematical, psychological and biological background of Grossberg and a look at Grossbergs list of publications in the 1980’s and earlier make this story appear more than plausible.

It may also well be, that this story is just another instance of the effect, that good ideas usually pop up at more than one place…

Yesterday I uploaded the paper “Linear convergence of the Randomized Sparse Kaczmarz Method” by Frank Schöpfer and myself to the arXiv.

Recall that the Kaczmarz method for linear systems

\displaystyle \begin{array}{rcl} Ax&=&b \end{array}


\displaystyle \begin{array}{rcl} x^{k+1} &=& x^{k} - \frac{\langle a_{i},x_{k}\rangle-b_{i}}{\|a_{i}\|^{2}}a_{i} \end{array}

where {a_{i}} is the {i}-th row of {A}, {b_{i}} is the {i}th entry of {b} and the index {i} is chosen according to some rule. We could, e.g. choose the rows in a cyclic manner, i.e. starting with the first row, proceeding to the next row and once we came to the last row we would start over from the first again. It is known (and probably proved by Kaczmarz himself) that the method converges to a solution of {Ax=b} whenever the system has a solution. Moreover, it easy to see that we converge to the minimum norm solution in case of underdetermined systems when the method is initialized with zero. This is due to the fact that the whole iteration takes place in the range space of {A^{T}}.

In this and this paper we proposed a simple modification of the Kaczmarz method, that makes it converge to sparse solutions. The modification is simply

\displaystyle \begin{array}{rcl} z^{k+1} & = & z^{k} - \frac{\langle a_{i},x_{k}\rangle-b_{i}}{\|a_{i}\|^{2}}a_{i}\\ x^{k+1}& = & S_{\lambda}(z^{k+1}) \end{array}

where {S_{\lambda}(x) = \max(|x|-\lambda,0)\text{sign}(x)} is the soft thresholding function. In this paper we proved that this variant converges, when initialized with zero and for a consistent system, to the solution of

\displaystyle \begin{array}{rcl} \min_{x}\lambda\|x\|_{1} + \tfrac12\|x\|_{2}^{2},\quad\text{s.t.}\quad Ax=b. \end{array}

For not too small values of {\lambda} this is indeed a sparse solution of {Ax=b} and Frank also proved that there is a threshold such that for {\lambda} larger than this threshold the solution is also the minimum {\ell^{1}} solution.

In general, convergence rates for the Kaczmarz method (and its sparse variant) are hard to prove. To convince oneself of this fact note that the convergence speed can drastically change if the rows of the system are reordered. The situation changes if one uses a randomization of the sparse Kaczmarz method where, in each iteration, a row is chose at random. Strohmer and Vershynin proved that this randomization leads to linear convergence rate. In the above mentioned paper we were able to prove the same result for the randomized sparse Kaczmarz method. While this sounds like an obvious generalization, the methods we use are totally different. While the linear convergence of the randomized Kaczmarz method can be proven in a few lines(well, probably one page) using only very basic tools, we need, among other things, quite intricate error bounds for Bregman projections w.r.t. piecewise linear-quadratic convex functions.

In fact, the linear convergence can be observed in practice and we illustrate the usefulness of the randomization and also the “sparsification” on some examples in the paper. For example the following figure shows the decay of the residual for the the randomized Kaczmarz method (black), the randomized sparse Kaczmarz method (red) and the randomized sparse Kaczmarz method with exact steps (green) which I did not explain here.


More details are in the paper…