### August 2011

On Thursday and Friday, a minisymposium on “Compressed Sensing and Sparse Approximation Algorithms” takes place at ILAS 11.

Holger Rauhut talked about Compressive Sensing and Structured Random Matrices and his abstract was

Compressive sensing is a recent paradigm in signal processing and sampling theory that predicts that sparse signals can be recovered from a small number of linear and non- adaptive measurements using convex optimization or greedy algorithms. Quite remark- ably, all good constructions of the so called measurement matrix known so far are based on randomness. While Gaussian random matrices provide optimal recovery guarantees, such unstructured matrices are of limited use in applications. Indeed, structure often allows to have fast matrix vector multiplies. This is crucial in order to speed up recovery algorithms and to deal with large scale problems. The talk discusses models of structured random matrices that are useful in certain applications, and presents corresponding recovery guarantees. An important type of structured random matrix arises in connection with sampling sparse expansions in terms of bounded orthogonal systems (such as the Fourier system). The second type of structured random matrices to be discussed are partial random circulant matrices, that is, from convolution. In particular, we present recent results with J. Romberg and J. Tropp on the restricted isometry property of such matrices.

Mark A. Iwen talked about Compressed Sensing for Manifold Data and his abstract was

Preliminary work involving the recovery of manifold data using compressive measurements will be discussed, along with related sampling bounds for the approximation of manifold data via a recent multiscale piecewise linear approximation method known as “Geometric Wavelets” [Allard, Chen, Maggioni, Multiscale Geometric Methods for Data Sets II: Geometric Wavelets, Submitted, 2011].

John Wright talked about Local Correctness of Dictionary Learning Algorithms and his abstract was

The idea that many important classes of signals can be well-represented by linear com- binations of a small set of atoms selected from a given dictionary has had dramatic impact on the theory and practice of signal processing. For practical problems in which an appropriate sparsifying dictionary is not known ahead of time, a very popular and successful heuristic is to search for a dictionary that minimizes an appropriate sparsity surrogate over a given set of sample data. In this talk, we describe steps towards un- derstanding when this problem can be solved by efficient algorithms. We show that under mild hypotheses, the dictionary learning problem is locally well-posed: the desired solution is a local minimum of the 1 norm. Namely, if ${A \in \mathbf{R}^{m\times n}}$ is an incoherent (and possibly overcomplete) dictionary, and the coefficients ${X \in \mathbf{R}^{n\times p}}$ follow a random sparse model, then with high probability ${(A, X)}$ is a local minimum of the 1 norm over the manifold of factorizations ${(A', X')}$ satisfying ${A' X' = Y}$ , provided the number of samples ${p = \Omega(n^3 k)}$. For overcomplete A, this is the first result showing that the dictionary learning problem is locally solvable.

The last speaker of the first session was Jakob Lemvig, talking on Sparse Dual Frames and his abstract was

Frames are generalizations of bases which lead to redundant signal expansions, and they play an important role in many applications, e.g., in the theory of nonuniform sampling, wireless communications, and Sigma-Delta quantization. Sparsity of frame vectors (in some fixed orthonormal basis) is a new paradigm in frame theory that among other things allows for simple representation of the frame vectors and fast decomposi- tion and reconstruction procedures. Recently, a general construction method for sparse tight frames was proposed in [Casazza, Heinecke, Krahmer, Kutyniok, Optimally sparse frames, preprint]. In this talk, however, we will initiate the study of sparse dual frames of a given (not necessarily tight nor sparse) frame. We present theoretical and experimental results showing that sparse duals can have many desirable properties as well as providing fast reconstruction.

Today we have to following speakers:

Gitta Kutyniok, Separation of Data by Sparse Approximations

Modern data is customarily of multimodal nature, and analysis tasks typically require separation into the single components. Although a highly ill-posed problem, the morphological difference of these components often allows a precise separation. A novel methodology consists in applying 1 minimization to a composed dictionary consisting of tight frames each sparsifying one of the components. In this talk, we will first discuss a very general approach to derive estimates on the accuracy of separation using cluster coherence and clustered/geometric sparsity. Then we will use these results to analyze performance of this methodology for separation of image components.

Götz E. Pfander, From the Bourgain Tzafriri Restricted Invertibility Theorem to restricted isometries

The Bourgain-Tzafriri Restricted Invertibility Theorem states conditions under which a Riesz bases (for its span) can be extracted from a given system of norm one vectors in finite dimensional spaces. The challenge is to choose a large subset while making sure that the resulting lower Riesz bound remains large. An upper Riesz bound of the selected Riesz bases is inherited from the frame bound of the original system of vectors. In this talk, we shall present an algorithm that allows us to control both, the lower and the upper, Riesz bounds.

Sadegh Jokar, Compressed Sensing and Sparse Solution of PDEs

Compressed sensing says that a high dimensional but sparse signal can be reconstructed from only a small number of (random) measurements. We consider a related but different problem. We are interested in the numerical solution of partial differential equations ${Lu = f}$ with a differential operator L using some ideas from compressed sensing. Considering a classical Galerkin or Petrov-Galerkin finite ele- ment approach, one seeks a solution u in some function space ${U}$ (which is spanned by ${\{\varphi_1 , \dots,\varphi_n\}}$), represented as ${u = \sum_{i=1}^n u_i \varphi_i}$. Here we would like to find the sparsest representation of the solution ${u}$ in the space ${U = span\{u_1 , \dots , u_n \}}$ using discretized finite elements via ${\ell^1}$-minimization. Specially in adaptive methods, the usual approach to achieve this goal is to use local a posteriori error estimation to determine where a refinement is needed, but we use 1 -minimization and linear programming to perform the adaptive refinement in the finite element method in such a way that the solution is sparsely represented by a linear combination of basis functions. We also perform the analysis of solutions of underdetermined linear systems that have Kronecker product structure and show some theoretical results in this direction.

In fact, Sadegh also talked about his work on Compressed Sensing with matrices which are Kronecker product where he investigated how several interesting constants behave under the formation of Kronecker products. He (I guess) invented the term “sparkity” for the smallness of the spark of a matrix, and then the result sound funny: The sparkity of a Kronecker product is equal to the minimum of the sparkity of the products.

Felix Krahmer, New and Improved Johnson-Lindenstrauss Embeddings via the Restricted Isometry Property

The Johnson-Lindenstrauss (JL) Lemma states that any set of p points in high dimensional Euclidean space can be embedded into ${O(\delta^{-2} \log(p))}$ dimensions, without distorting the distance between any two points by more than a factor between ${1-\delta}$ and ${1 + \delta}$ . We establish a new connection between the JL Lemma and the Restricted Isometry Property (RIP), a well-known concept in the theory of sparse recovery often used for showing the success of ${\ell^1}$-minimization. Consider an ${m \times N}$ matrix satisfying the ${(k, \delta_k )}$-RIP and an arbitrary set ${E}$ of ${O(e^k }$points in ${R^N}$ . We show that with high probability, such a matrix with randomized column signs maps ${E}$ into ${R^m}$ without distorting the distance between any two points by more than a factor of ${1 \pm 4\delta k}$ . Consequently, matrices satisfying the Restricted Isometry of optimal order provide optimal Johnson-Lindenstrauss embeddings up to a logarithmic factor in ${N}$. Moreover, our results yield the best known bounds on the necessary embedding dimension m for a wide class of structured random matrices. Our results also have a direct application in the area of compressed sensing for redundant dictionaries.

After a very refreshing and enjoying summer vacation I am back to work and back to blogging. This week there is the ILAS 11 (where ILAS stands for International Linear Algebra Society) here at TU Braunschweig; and since the talks take place in the building just next to where I am, I enjoyed some of them. Two talks have been especially interesting to me.

The first one was Tikhonov Regularization for Large Scale Inverse Problems by Melina Freitag (maybe the first link is not working yet but Melina said that she is going to upload her slides under that address). She talked about the ways the weather forecast is done these days in the UK and in Europe and especially on the process of Data Assimilation where one uses the previous weather forecasts and newly arrived measurements to produce a better estimate of the state. As a matter of fact, two popular methods use in this field (3dVar and 4dVar) are equivalent to classical Tikhonov regularization. In her section on ${L^1}$-penalties (which I usually call ${\ell^1}$-penalties…) she actually introduced a kind of discrete ${TV}$-penalty as a replacement for the usual quadratic (${\ell^2}$) penalty. Her motivation was as usual: Tikhonov regularization smoothes too much and weather fronts are not smooth. She did not have results of this kind of ${TV}$ regularization as a replacement in 4dVar with real weather data but with smaller toy examples (with non-linear advection equations) since the resulting optimization problem is LARGE. However, her results look promising. I am not sure if she did, but one was tempted to arrive at the conclusion that “4dVar with ${TV}$ penalty gives a better resolution of weather fronts”. It happened that during her talk there was thunderstorm with heavy rain in front of the windows which has not been predicted by the forecast (according to which, the thunderstorm should happen the next day). Now: Would a ${TV}$ penalty be able to predict this thunderstorm for the right time? I am not sure. While ${TV}$ penalties do enforce edges, the precise place of the edge is still not too sure. My feeling is, that the accuracy of the position is better, the less the curvature of the edge is, but in general this highly depends on the ill-posed problem at hand.

The second talk was Recent Progress in the Solution of Discrete Ill-Posed Problems by Michiel Hochstenbach. The talk was pretty amusing; I especially liked the slogan for discrete ill-posed problems:

How to wisely divide by zero.

Also he introduced the three forms of variational regularization which I usually call Tikhonov, Ivanov and Morozov regularization (on slide 34) and introduced the Pareto front (under the name it usually has in discrete ill-posed problems: L-curve).

Another appealing slogan was:

Discrete ill-posed problems are essentially underdetermined.

(Since we do not want to solve the respective equation exactly, we usually have a lot of approximate solution of which we have to choose the one we like the most. Of course this is the same for continuous ill-posed problems.) As a consequence: One should use as much prior knowledge as possible.

In the rest of his talk he talked about improvements of the classical Tikhonov regularization (which is a linear method) by other linear methods with respect to both reconstruction quality and computational burden. He motivated his SRSVD (subspace restricted SVD) by the observation that the simple truncated SVD is often failing due to the fact that the first singular vectors do not contain useful information of the solution. He proposed to use a selected set of (orthonormal) vectors and use a restricted SVD in the following. However, does this use the knowledge that the solution shall be a linear combination of these selected vectors? And wouldn’t it be beneficial to use a larger set of (non-orthonormal) vectors, put into a (possible overcomplete) dictionary and use a sparse reconstruction method such a ${\ell^1}$ regularization which automatically selects the relevant vectors? He also proposed the so called “linear combination approach” which basically takes several outputs of various methods and search within the linear combinations of this outputs for a better solution. To do so he proposed to use another Ivanov-type regularization (slide 79). I still did not get why he uses the largest available norm as a constraint here… However there should be an answer somewhere is his papers.

Edit: Michiel sent me the following answer:

I used the largest available norm, since the norms of many solution approaches are often smaller than, or approximately equal to the true norm. In the paper Discrete ill-posed least-squares problems with a solution norm constraint we reach the conclusion that
“For the approach based on the solution norm constraint, it seems important that $\|x\|$ not be underestimated.”

Edit: The above mentioned paper Discrete ill-posed least-squares problems with a solution norm constraint is to be published in “Linear Algebra and its Applications”. It can be found via its doi.