If you want to have a sparse solution to a linear system of equation and have heard of compressed sensing or sparse reconstruction than you probably know what to do: Get one of the many solvers for Basis Pursuit and be happy.

Basis Pursuit was designed as a convex approximation of the generally intractable problem of finding the sparsest solution (that is, the solution with the smallest number of non-zero entries). By abuse of notation, we define for {x\in\mathbb{R}^n}

\displaystyle \|x\|_0 = \#\{i\ : x_i\neq 0\}.

(Because of {\|x\|_0 = \lim_{p\rightarrow 0}\|x\|_p^p} some people prefer the, probably more correct but also more confusing, notation {\|x\|_0^0}…).

Then, the sparsest solution of {Ax=b} is the solution of

\displaystyle \min_x \|x\|_0,\quad \text{s.t.}\ Ax=b

and Basis Pursuit replaces {\|x\|_0} with “the closest convex proxy”, i.e.

\displaystyle \min_x \|x\|_1,\quad\text{s.t.}\ Ax=b.

The good thing about Basis Pursuit suit is, that is really gives the sparsest solution under appropriate conditions as is widely known nowadays. Here I’d like to present two simple examples in which the Basis Pursuit solution is

  • not even close to the sparsest solution (by norm).
  • not sparse at all.

1. A small bad matrix

We can build a bad matrix for Basis Pursuit, even in the case {2\times 3}: For a small {\epsilon>0} define

\displaystyle A = \begin{bmatrix} \epsilon & 1 & 0\\ \epsilon & 0 & 1 \end{bmatrix}, \quad b = \begin{bmatrix} 1\\1 \end{bmatrix}.

Of course, the sparsest solution is

\displaystyle x_0 = \begin{bmatrix} 1/\epsilon\\ 0\\ 0\end{bmatrix}

while the solution of Basis Pursuit is

\displaystyle x_1 = \begin{bmatrix} 0\\1\\1 \end{bmatrix}.

The summarize: For {\epsilon<1/2}

\displaystyle \|x_0\|_0 = 1 < 2 = \|x_1\|_0,\quad \|x_0\|_1 = 1/\epsilon > 2 = \|x_1\|_1.

(There is also a least squares solution that has three non-zero entries and a one-norm slightly larger than 2.)

Granted, this matrix is stupid. Especially, its first column has a very small norm compared to the others. Ok, let’s construct a matrix with normalized columns.

2. A small bad matrix with normalized columns

Fix an integer {n} and a small {\epsilon>0}. We define a {n\times(n+2)}-matrix

\displaystyle \begin{bmatrix} 1+\epsilon/2 & -1+\epsilon/2 & 1 & 0 & \dots & \dots &0\\ -1+\epsilon/2 & 1+\epsilon/2 & 0 & 1 & \ddots & & 0\\ \epsilon/2 & \epsilon/2 & \vdots & \ddots& \ddots & \ddots& \vdots\\ \vdots & \vdots & \vdots & & \ddots & \ddots& 0\\ \epsilon/2 & \epsilon/2 & 0 & \dots& \dots& 0 & 1 \end{bmatrix}.

Ok, the first two columns do not have norm 1 yet, so we normalize them by multiplying with the right constant

\displaystyle c = \frac{1}{\sqrt{2 + \tfrac{n\epsilon^2}{4}}}

(which is close to {1/\sqrt{2}}) to get

\displaystyle A = \begin{bmatrix} c(1+\epsilon/2) & c(-1+\epsilon/2) & 1 & 0 & \dots & \dots &0\\ c(-1+\epsilon/2) & c(1+\epsilon/2) & 0 & 1 & \ddots & & 0\\ c\epsilon/2 & c\epsilon/2 & \vdots & \ddots& \ddots & \ddots& \vdots\\ \vdots & \vdots & \vdots & & \ddots & \ddots& 0\\ c\epsilon/2 & c\epsilon/2 & 0 & \dots& \dots& 0 & 1 \end{bmatrix}.

Now we take the right hand side

\displaystyle b = \begin{bmatrix} 1\\\vdots\\1 \end{bmatrix}

and see what solutions to {Ax=b} are there.

First, there is the least squares solution {x_{\text{ls}} = A^\dagger b}. This has only non-zero entries, the last {n} entries are slightly smaller than {1} and the first two are between {0} and {1}, hence, {\|x_{\text{ls}}\|_1 \approx n} (in fact, slightly larger).

Second, there is a very sparse solution

\displaystyle x_0 = \frac{1}{\epsilon c} \begin{bmatrix} 1\\ 1\\ 0\\ \vdots\\ 0 \end{bmatrix}.

This has two non-zero entries and a pretty large one-norm: {\|x_0\|_1 = 2/(\epsilon c)}.

Third there is a solution with small one-norm:

\displaystyle x_1 = \begin{bmatrix} 0\\ 0\\ 1\\ \vdots\\ 1 \end{bmatrix}.

We have {n} non-zero entries and {\|x_1\|_1 = n}. You can check that this {x_1} is also the unique Basis Pursuit solution (e.g. by observing that {A^T[1,\dots,1]^T} is an element of {\partial\|x_1\|_1} and that the first two entries in {A^T[1,\dots,1]^T} are strictly smaller than 1 and positive – put differently, the vector {[1,\dots,1]^T} is dual certificate for {x_1}).

To summarize, for {\epsilon < \sqrt{\frac{8}{n^2-n}}} it holds that

\displaystyle \|x_0\|_0 = 2 < n = \|x_1\|_0,\quad \|x_0\|_1 = 2/(c\epsilon) > n = \|x_1\|_1.

The geometric idea behind this matrix is as follows: We take {n} simple normalized columns (the identity part in {A}) which sum up to the right hand side {b}. Then we take two normalized vectors which are almost orthogonal to {b} but have {b} in their span (but one needs huge factors here to obtain {b}).

Well, this matrix looks very artificial and indeed it’s constructed for one special purpose: To show that minimal {\ell^1}-norm solution are not always sparse (even when a sparse solution exists). It’s some kind of a hobby for me to construct instances for sparse reconstruction with extreme properties and I am thinking about a kind of “gallery” of these instances (probably extending the “gallery” command in Matlab).
By the way: if you want to play around with this matrix, here is the code

n = 100;
epsilon = sqrt(8/(n^2-n))+0.1;
c = 1/sqrt(2+n*epsilon^2/4);
A = zeros(n,n+2);
A(1:2,1:2) = ([1 -1;-1,1]+epsilon/2)*c;
A(3:n,1:2) = epsilon/2*c;
A(1:n,3:n+2) = eye(n);
b = ones(n,1);