Yesterday I uploaded the paper “Linear convergence of the Randomized Sparse Kaczmarz Method” by Frank Schöpfer and myself to the arXiv.

Recall that the Kaczmarz method for linear systems

\displaystyle \begin{array}{rcl} Ax&=&b \end{array}


\displaystyle \begin{array}{rcl} x^{k+1} &=& x^{k} - \frac{\langle a_{i},x_{k}\rangle-b_{i}}{\|a_{i}\|^{2}}a_{i} \end{array}

where {a_{i}} is the {i}-th row of {A}, {b_{i}} is the {i}th entry of {b} and the index {i} is chosen according to some rule. We could, e.g. choose the rows in a cyclic manner, i.e. starting with the first row, proceeding to the next row and once we came to the last row we would start over from the first again. It is known (and probably proved by Kaczmarz himself) that the method converges to a solution of {Ax=b} whenever the system has a solution. Moreover, it easy to see that we converge to the minimum norm solution in case of underdetermined systems when the method is initialized with zero. This is due to the fact that the whole iteration takes place in the range space of {A^{T}}.

In this and this paper we proposed a simple modification of the Kaczmarz method, that makes it converge to sparse solutions. The modification is simply

\displaystyle \begin{array}{rcl} z^{k+1} & = & z^{k} - \frac{\langle a_{i},x_{k}\rangle-b_{i}}{\|a_{i}\|^{2}}a_{i}\\ x^{k+1}& = & S_{\lambda}(z^{k+1}) \end{array}

where {S_{\lambda}(x) = \max(|x|-\lambda,0)\text{sign}(x)} is the soft thresholding function. In this paper we proved that this variant converges, when initialized with zero and for a consistent system, to the solution of

\displaystyle \begin{array}{rcl} \min_{x}\lambda\|x\|_{1} + \tfrac12\|x\|_{2}^{2},\quad\text{s.t.}\quad Ax=b. \end{array}

For not too small values of {\lambda} this is indeed a sparse solution of {Ax=b} and Frank also proved that there is a threshold such that for {\lambda} larger than this threshold the solution is also the minimum {\ell^{1}} solution.

In general, convergence rates for the Kaczmarz method (and its sparse variant) are hard to prove. To convince oneself of this fact note that the convergence speed can drastically change if the rows of the system are reordered. The situation changes if one uses a randomization of the sparse Kaczmarz method where, in each iteration, a row is chose at random. Strohmer and Vershynin proved that this randomization leads to linear convergence rate. In the above mentioned paper we were able to prove the same result for the randomized sparse Kaczmarz method. While this sounds like an obvious generalization, the methods we use are totally different. While the linear convergence of the randomized Kaczmarz method can be proven in a few lines(well, probably one page) using only very basic tools, we need, among other things, quite intricate error bounds for Bregman projections w.r.t. piecewise linear-quadratic convex functions.

In fact, the linear convergence can be observed in practice and we illustrate the usefulness of the randomization and also the “sparsification” on some examples in the paper. For example the following figure shows the decay of the residual for the the randomized Kaczmarz method (black), the randomized sparse Kaczmarz method (red) and the randomized sparse Kaczmarz method with exact steps (green) which I did not explain here.


More details are in the paper…