If you want to have a sparse solution to a linear system of equation and have heard of compressed sensing or sparse reconstruction than you probably know what to do: Get one of the many solvers for Basis Pursuit and be happy.

Basis Pursuit was designed as a convex approximation of the generally intractable problem of finding the sparsest solution (that is, the solution with the smallest number of non-zero entries). By abuse of notation, we define for

(Because of some people prefer the, probably more correct but also more confusing, notation …).

Then, the sparsest solution of is the solution of

and Basis Pursuit replaces with “the closest convex proxy”, i.e.

The good thing about Basis Pursuit suit is, that is really gives the sparsest solution *under appropriate conditions* as is widely known nowadays. Here I’d like to present two simple examples in which the Basis Pursuit solution is

- not even close to the sparsest solution (by norm).
- not sparse at all.

**1. A small bad matrix **

We can build a bad matrix for Basis Pursuit, even in the case : For a small define

Of course, the sparsest solution is

while the solution of Basis Pursuit is

The summarize: For

(There is also a least squares solution that has three non-zero entries and a one-norm slightly larger than 2.)

Granted, this matrix is stupid. Especially, its first column has a very small norm compared to the others. Ok, let’s construct a matrix with normalized columns.

**2. A small bad matrix with normalized columns **

Fix an integer and a small . We define a -matrix

Ok, the first two columns do not have norm 1 yet, so we normalize them by multiplying with the right constant

(which is close to ) to get

Now we take the right hand side

and see what solutions to are there.

First, there is the least squares solution . This has only non-zero entries, the last entries are slightly smaller than and the first two are between and , hence, (in fact, slightly larger).

Second, there is a very sparse solution

This has two non-zero entries and a pretty large one-norm: .

Third there is a solution with small one-norm:

We have non-zero entries and . You can check that this is also the unique Basis Pursuit solution (e.g. by observing that is an element of and that the first two entries in are strictly smaller than 1 and positive – put differently, the vector is dual certificate for ).

To summarize, for it holds that

The geometric idea behind this matrix is as follows: We take simple normalized columns (the identity part in ) which sum up to the right hand side . Then we take two normalized vectors which are almost orthogonal to but have in their span (but one needs huge factors here to obtain ).

Well, this matrix looks very artificial and indeed it’s constructed for one special purpose: To show that minimal -norm solution are not always sparse (even when a sparse solution exists). It’s some kind of a hobby for me to construct instances for sparse reconstruction with extreme properties and I am thinking about a kind of “gallery” of these instances (probably extending the “gallery” command in Matlab).

By the way: if you want to play around with this matrix, here is the code

n = 100; epsilon = sqrt(8/(n^2-n))+0.1; c = 1/sqrt(2+n*epsilon^2/4); A = zeros(n,n+2); A(1:2,1:2) = ([1 -1;-1,1]+epsilon/2)*c; A(3:n,1:2) = epsilon/2*c; A(1:n,3:n+2) = eye(n); b = ones(n,1);

November 24, 2011 at 2:27 pm

Dirk,

the first x_0 is missing two zeros.

November 24, 2011 at 2:38 pm

Thanks! Corrected – some bug in the latex2wp converted eliminated these two zeros…

November 24, 2011 at 7:10 pm

Hi Dirk,

I am a Ph.D student and I am using basis pursuit to get sparse result of seismic reflectivity. Sometimes the lateral continuity of reflectivity profile is not good. Do you know some methods which could improve the lateral continuity?

Thanks very much!

November 24, 2011 at 9:53 pm

It’s difficult to answer with this little information… The answer would very much depend on the operator involved and the basis you use. However, sparsity and continuity do not go well together in general. It sounds like a kind of group sparsity could help…

November 24, 2011 at 10:49 pm

Dear Dirk,

AThere is one important result from compressed sensing that the

number of measurements (M) must be > K LOG (N/K), maybe there is

an other restriction for M/N while your small matrix problem, violates all

theory. One other think, can you find a transform bases to map your matrices into sparsest one, instead of unit transformation?

Mehmet

November 25, 2011 at 3:37 pm

For sure the given matrix violates all sufficient conditions for exact recovery!

November 29, 2011 at 11:01 am

Hi Dirk,

Nice entry. I have posted a follow up here: http://media.aau.dk/null_space_pursuits/2011/11/a-simple-example-where-bp-fails-and-omp-succeeds.html

Basically we see that OMP can succeed where BP fails, but there is a limit.

November 29, 2011 at 1:57 pm

That is interesting. I believed to know that BP is in general superior to OMP due to the results (e.g.) by Tropp. However, this holds only in a “uniform way”, i.e. if one wants to have exact recovery of all vectors with a certain support. This does, of course, not say too much for specific vectors…

December 7, 2011 at 9:16 pm

Hi,

The result of Tropp (ERC criterion) ensures that both BP and OMP succeed (see for instance the “Greed is good” paper).

There are simple examples of the converse situation in the original BP paper of Chen and Donoho (e.g. Gabor dictionary when two atoms are too close).

Gabriel

November 29, 2011 at 5:40 pm

Hi Dirk, my post has evolved to show OMP recovers the second correct atom as well.

I think many others see BP as the “better” algorithm. However, that is simply not true. It depends entirely on the problem. BP is indeed robust to changes in the underlying distribution of the signal; however, in choosing an algorithm for solving an inverse problem, if one expects the sparse elements to be distributed Normally, then greedy methods will outperform BP to a significant degree when it comes to model recovery … and, as far as I can tell from my experiments, this holds in a uniform way.

See my paper for a variety of surprises: http://arxiv.org/abs/1103.6246

December 7, 2011 at 9:35 pm

Gabriel, the ERC of Tropp is a condition which ensures that all vectors with that support can be recovered exactly. However, there may be vectors with a support which violates the ERC but still are recoverable. There is also a condition which ensures that all vectors with a given sign-pattern are recoverable. You need that the range of A^T intersects the relative interior of the multivalued sign of that pattern and that the columns of A corresponding to the support are independent. That is, in general, much weaker than ERC and the Null Space Property.

December 9, 2011 at 8:56 pm

Hi Dirk,

Yes you are correct.

I would add that:

1/ ERC(I) gives robustness to an arbitrary bounded noise for any signal support on I. This is a strong conclusion, maybe too strong for most problems (since only few vectors will actually pass the ERC in realistic super-resolution scenarios).

2/ JJ Fuchs in his original paper in fact derives a criteria F(s) that depends on the sign s (and not on the support) that gives robustness to a small noise (i.e. smaller than some constant that depends on the sign). Taking the worse case F(s) for s supported in I gives ERC(I). To the best of my knowledge, this is only true for L1 (not for OMP).

3/ Null-space properties (and probably also your criterion) does not gives any robustness in general. Even if the noise is small and the regularization parameter selected optimally, the solution might be supported outside the true support and thus might be unstable. This is why I think such “topological” criteria are not very useful in practice.

— best

G

December 9, 2011 at 9:11 pm

Oops, I believe I went to fast over your sentence “A^T intersects the relative interior of the multivalued sign”.

I believe this is more or less equivalent to having F(s)<1 (F(s) is a bound on the L1 sub-differential component that to not saturate), so this will give a robustness constant similar to the constant 1/(1-F(s)) of J.J. Fuchs for small noise robustness.

— best

G

March 5, 2012 at 5:17 am

Thanks so much for these examples. This has cleared up some issues for me.

What is most interesting about the first example is it shows how much tighter the Donoho Tanner phase transition is compared to the RIP based analysis of Candes et al. Candes famous result gives a sufficient condition for the l1 based on the RIP like this: If the RIP constant is 0.72.

However we know that l-1 works for eps > 0.5. Now if you follow Donoho and Tanner’s analysis and count the number of vertices of the convex set:

{x in R^3: ||x||_1 0.5 for 100% recovery! Furthermore for eps <0.5 you can even tell what proportion of 1-sparse vectors are recoverable (2/3).

Once again, thank you for the example.

Best,

Pat

March 5, 2012 at 5:21 am

My comment did not render correctly.

*What I meant to say was. If you count the number of vertices of the set:

x in R^3: ||x||_1 is less than 1

You get eps less than 0.5 for 100% recovery! Furthermore for eps less than 0.5 you can even tell what proportion of 1-sparse vectors are recoverable (2/3).