Today I write about sparse recovery of “multidimensional” signals. With “multidimensional” I mean something like this: A one-dimensional signal is a vector ${x\in{\mathbb R}^n}$ while a two-dimensional signal is a matrix ${x\in{\mathbb R}^{n_1\times n_2}}$. Similarly, ${d}$-dimensional signal is ${x\in{\mathbb R}^{n_1\times\cdots\times n_d}}$. Of course, images as two-dimensional signals come time mind. Moreover, movies are three-dimensional, a hyperspectral 2D image (which has a whole spectrum attached to any pixel) is also three-dimensional, and time-dependent volume-data is four-dimensional.

Multidimensional data is often a challenge due the large amount of data. While the size of the signals is usually not the problem it is more the size of the measurement matrices. In the context of compressed sensing or sparse recovery the signal is measured with a linear operator, i.e. one applies a number ${m}$ of linear functionals to the signal. In the ${d}$-dimensional case this can be encoded as a matrix ${A\in {\mathbb R}^{m\times \prod_1^d n_i}}$ and this is where the trouble with the data comes in: If you have megapixel image (which is still quite small) the matrix has a million of columns and if you have a dense matrix, storage becomes an issue.

One approach (which is indeed quite old) to tackle this problem is, to consider special measurement matrices: If the signal has a sparse structure in every slice, i.e. every vector is of the form ${x(i_1,\dots,i_{k-1},:,i_{k+1},\dots,i_d)}$ where we fix all but the ${k}$-th component, then the Kronecker product of measurement matrices for each dimension is the right thing.

The Kronecker product of two matrices ${A\in{\mathbb R}^{m\times n}}$ and ${B\in{\mathbb R}^{k\times j}}$ is the ${mk\times nj}$ matrix

$\displaystyle A\otimes B = \begin{bmatrix} a_{11}B & \dots & a_{1n}B\\ \vdots & & \vdots\\ a_{n1}B & \dots & a_{nn}B \end{bmatrix}.$

This has a lot to do with the tensor product and you should read the Wikipedia entry. Moreover, it is numerically advantageous not to build the Kronecker product of dense matrices if you only want to apply it to a given signal. To see this, we introduce the vectorization operator ${\text{vec}:{\mathbb R}^{m\times n}\rightarrow{\mathbb R}^{nm}}$ which takes a matrix ${X}$ and stacks its columns into a tall column vector. For matrices ${A}$ and ${B}$ (of fitting sizes) it holds that

$\displaystyle (B^T\otimes A)\text{vec}(X) = \text{vec}(AXB).$

So, multiplying ${X}$ from the left and from the right gives the application of the Kronecker product.

The use of Kronecker products in numerical linear algebra is fairly old (for example they are helpful for multidimensional finite difference schemes where you can build Kronecker products of sparse difference operators in respective dimensions). Recently, they have been discovered for compressed sensing and sparse recovery in these two papers: Sparse solutions to underdetermined Kronecker product systems by Sadegh Jokar and Volker Mehrmann and the more recent Kronecker Compressed Sensing by Marco Duarte and Rich Baraniuk.

From these papers you can extract some interestingly simple and nice theorems:

Theorem 1 For matrices ${A_1,\dots A_d}$ with restricted isometry constant ${\delta_K(A_1),\dots,\delta_K(A_d)}$ of order ${K}$ it holds that the restricted isometry constant of the Kronecker product fulfills

$\displaystyle \max_i \delta_K(A_i) \leq \delta_K(A_1\otimes\cdots\otimes A_d) \leq \prod_1^d (1+\delta_K(A_i))-1.$

Basically, the RIP constant of a Kronecker product is not better than the worst one but still not too large.

Theorem 2 For matrices ${A_1,\dots, A_d}$ with columns normalized to one, it hold that the spark of their Kronecker product fulfills

$\displaystyle \text{spark}(A\otimes \dots\otimes A_d) = = \min_i\text{spark}(A_i).$

Theorem 3 For matrices ${A_1,\dots, A_d}$ with columns normalized to one, it hold that the mutual coherence of their Kronecker product fulfills

$\displaystyle \mu(A_1\otimes\dots\otimes A_d) = \max_i \mu(A_i).$