Today I report on two things I came across here at ISMP:

  • The first is a talk by Russell Luke on Constraint qualifications for nonconvex feasibility problems. Luke treated the NP-hard problem of sparsest solutions of linear systems. In fact he did not tackle this problem but the problem to find an {s}-sparse solution of an {m\times n} system of equations. He formulated this as a feasibility-problem (well, Heinz Bauschke was a collaborator) as follows: With the usual malpractice let us denote by {\|x\|_0} the number of non-zero entries of {x\in{\mathbb R}^n}. Then the problem of finding an {s}-sparse solution to {Ax=b} is:

    \displaystyle  \text{Find}\ x\ \text{in}\ \{\|x\|_0\leq s\}\cap\{Ax=b\}.

    In other words: find a feasible point, i.e. a point which lies in the intersection of the two sets. Well, most often feasibility problems involve convex sets but here, the first one given by this “{0}-norm” is definitely not convex. One of the simplest algorithms for the convex feasibility problem is to alternatingly project onto both sets. This algorithm dates back to von Neumann and has been analyzed in great detail. To make this method work for non-convex sets one only needs to know how to project onto both sets. For the case of the equality constraint {Ax=b} one can use numerical linear algebra to obtain the projection. The non-convex constraint on the number of non-zero entries is in fact even easier: For {x\in{\mathbb R}^n} the projection onto {\{\|x\|_0\leq s\}} consists of just keeping the {s} largest entries of {x} while setting the others to zero (known as the “best {s}-term approximation”). However, the theory breaks down in the case of non-convex sets. Russell treated problem in several papers (have a look at his publication page) and in the talk he focused on the problem of constraint qualification, i.e. what kind of regularity has to be imposed on the intersection of the two sets. He could shows that (local) linear convergence of the algorithm (which is observed numerically) can indeed be justified theoretically. One point which is still open is the phenomenon that the method seems to be convergent regardless of the initialization and that (even more surprisingly) that the limit point seems to be independent of the starting point (and also seems to be robust with respect to overestimating the sparsity {s}). I wondered if his results are robust with respect to inexact projections. For larger problems the projection onto the equality constraint {Ax=b} are computationally expensive. For example it would be interesting to see what happens if one approximates the projection with a truncated CG-iteration as Andreas, Marc and I did in our paper on subgradient methods for Basis Pursuit.

  • Joel Tropp reported on his paper Sharp recovery bounds for convex deconvolution, with applications together with Michael McCoy. However, in his title he used demixing instead of deconvolution (which, I think, is more appropriate and leads to less confusion). With “demixing” they mean the following: Suppose you have two signals {x_0} and {y_0} of which you observe only the superposition of {x_0} and a unitarily transformed {y_0}, i.e. for a unitary matrix {U} you observe

    \displaystyle  z_0 = x_0 + Uy_0.

    Of course, without further assumptions there is no way to recover {x_0} and {y_0} from the knowledge of {z_0} and {U}. As one motivation he used the assumption that both {x_0} and {y_0} are sparse. After the big bang of compressed sensing nobody wonders that one turns to convex optimization with {\ell^1}-norms in the following manner:

    \displaystyle   \min_{x,y} \|x\|_1 + \lambda\|y\|_1 \ \text{such that}\ x + Uy = z_0. \ \ \ \ \ (1)

    This looks a lot like sparse approximation: Eliminating {x} one obtains the unconstraint problem \begin{equation*} \min_y \|z_0-Uy\|_1 + \lambda \|y\|_1. \end{equation*}

    Phrased differently, this problem aims at finding an approximate sparse solution of {Uy=z_0} such that the residual (could also say “noise”) {z_0-Uy=x} is also sparse. This differs from the common Basis Pursuit Denoising (BPDN) by the structure function for the residual (which is the squared {2}-norm). This is due to the fact that in BPDN one usually assumes Gaussian noise which naturally lead to the squared {2}-norm. Well, one man’s noise is the other man’s signal, as we see here. Tropp and McCoy obtained very sharp thresholds on the sparsity of {x_0} and {y_0} which allow for exact recovery of both of them by solving (1). One thing which makes their analysis simpler is the following reformulation: The treated the related problem \begin{equation*} \min_{x,y} \|x\|_1 \text{such that} \|y\|_1\leq\alpha, x+Uy=z_0 \end{equation*} (which I would call this the Ivanov version of the Tikhonov-problem (1)). This allows for precise exploitation of prior knowledge by assuming that the number {\alpha_0 = \|y_0\|_1} is known.

    First I wondered if this reformulation was responsible for their unusual sharp results (sharper the results for exact recovery by BDPN), but I think it’s not. I think this is due to the fact that they have this strong assumption on the “residual”, namely that it is sparse. This can be formulated with the help of {1}-norm (which is “non-smooth”) in contrast to the smooth {2}-norm which is what one gets as prior for Gaussian noise. Moreover, McCoy and Tropp generalized their result to the case in which the structure of {x_0} and {y_0} is formulated by two functionals {f} and {g}, respectively. Assuming a kind of non-smoothness of {f} and {g} the obtain the same kind of results and especially matrix decomposition problems are covered.