In this post I gladly announce that three problems that bothered me have been solved: The computational complexity of certifying RIP and NSP and the number of steps the homotopy method needs to obtain a solution of the Basis Pursuit problem.

1. Complexity of RIP and NSP

On this issue we have two papers:

The first paper has the more general results and hence, we start with the second one: The main result of the second paper is this:

Theorem 1 Let a matrix {A}, a positive integer {K} and some {0<\delta<1} be given. It is hard for NP under randomized polynomial-time reductions to check if {A} satisfies the {(K,\delta)} restricted isometry property.

That does not yet say that it’s NP-hard to check if {\delta} is an RIP constant for {K}-sparse vectors but it’s close. I think that Dustin Mixon has explained this issue better on his blog than I could do here.

In the first paper (which is, by the way, on outcome of the SPEAR-project in which I am involved…) the main result is indeed the conjectured NP-hardness of calculating RIP constants:

Theorem 2 For a given matrix {A} and a positive integer {K}, it is NP-hard to compute the restricted isometry constant.

Moreover, this is just a corollary to the main theorem of that paper which reads as

Theorem 3 For a given matrix {A} and a positive integer {K}, the problem to decide whether {A} satisfies the restricted isometry property of order {K} for some constant {\delta<1} is coNP-complete.

They also provide a slightly strengthened version of Theorem~1:

Theorem 4 Let a matrix {A}, a positive integer {K} and some {0<\delta<1} be given. It is coNP-complete to check if {A} satisfies the {(K,\delta)} restricted isometry property.

Moreover, the paper by Pfetsch and Tillmann also proves something about the null space property (NSP):

Definition 5 A matrix {A} satisfies the null space property of order {K} if there is a constant {\alpha>0} such that for all elements {x} in the null space of {A} it holds that the sum of the {K} largest absolute values of {x} is smaller that {\alpha} times the 1-norm of {x}. The smallest such constant {\alpha} is called the null space constant of order {K}.

Their main result is as follows:

Theorem 6 F or a given matrix {A} and a positive integer {K}, the problem to decide whether {A} satisfies the null space property order {K} for some constant {\alpha<1} is coNP-complete. Consequently, it is NP-hard to compute the null space constant of {A}.

2. Complexity of the homotopy method for Basis Pursuit

The second issue is about the basis pursuit problem

\displaystyle  \min_x \|x\|_1\quad\text{s.t.}\ Ax=b.

which can be approximated by the “denoising variant”

\displaystyle  \min_x \lambda\|x\|_1 + \tfrac12\|Ax-b\|_2^2.

What is pretty interesting about the denoising variant is, that the solution {x(\lambda)} (if it is unique throughout) depends on {\lambda} in a piecewise linear way and converges to the solution of basis pursuit for {\lambda\rightarrow 0}. This leads to an algorithm for the solution of basis pursuit: Start with {\lambda=\|A^Tb\|_\infty} (for which the unique solution is {x(\lambda)=0}), calculate the direction of the “solution path”, follow it until you reach a “break point”, calculate the next direction and so on until {\lambda} hits zero. This is for example implemented for MATLAB in L1Homotopy (the SPAMS package also seems to have this implemented, however, I haven’t used it yet). In practice, this approach (usually called homotopy method) is pretty fast and moreover, only detects a few break points. However, an obvious upper bound on the number of break point is exponential in the number of entries in {x}. Hence, it seemed that one was faced with a situation similar to the simplex method for linear programming: The algorithms performs great an average but the worst case complexity is bad. That this is really true for linear programming is known since some time by the Klee-Minty example, an example for which the simplex method takes an exponential number of steps. What I asked myself for some time: Is there a Klee-Minty example for the homotopy method?

Now the answer is there: Yes, there is!

The denoising variant of basis pursuit is also known as LASSO regularization in the statistics literature and this explains the title of the paper which comes up with the example:

Julien and Bin investigate the number of linear segments in the regularization path and first observe that this is upper bounded by {(3^p+1)/2} is {p} is the number of entries in {x} (i.e. the number of variables of the problem). Then they try to construct an instance that matches this upper bound. They succeed in a clever way: For a given instance {(A,b)} with a path with {k} linear segments they try to construct an instance which has one more variable such that the number of linear segments in increased by a factor. Their result goes like this:

Theorem 7 Let {A\in{\mathbb R}^{n\times p}} have full rank and let {b\in{\mathbb R}^n} be in the range of {A}. Assume that the homotopy path has {k} linear segments and denote by {\lambda_1} the regularization parameter which corresponds to the smallest kink in the path. Now choose {b_{n+1}\neq 0} and {\alpha} such that

\displaystyle   0<\alpha < \frac{\lambda_1}{2\|b\|_2^2 + b_{n+1}^2} \ \ \ \ \ (1)

and define {\tilde b\in{\mathbb R}^{n+1}} and {\tilde A\in{\mathbb R}{(n+1)\times (p+1)}} by

\displaystyle  \tilde b = \begin{bmatrix} b\\ b_{n+1} \end{bmatrix}, \quad \tilde A = \begin{bmatrix} A & 2\alpha b\\ 0 & \alpha b_{n+1} \end{bmatrix}.

Then the homotopy path for the basis pursuit problem with matrix {\tilde A} and right hand side {\tilde b} has {3k-1} linear segments.

With this theorem at hand, it is straightforward to recursively build a “Mairal-Yu”-example which matches the upper bound for the number of linear segments. The idea is to start with a {1\times 1} example and let it grow by one row and one column according to Theorem~7. We start with the simplest {1\times 1} example, namely {A = [1]} and {b=[1]}. To move to the next bigger example you can choose the next entry {b_{n+1}} and we always choose {1} for convenience. Moreover, you need the next {\alpha} and you need to know the smallest kink in the path. I calculated the paths and kinks with L1Packv2 by Ignace Loris because it is written in Mathematica and can use exact arithmetics with rational numbers (and you will see, that accuracy will be an issue even for small instances) and seemed bullet proof for me. Let’s see where this idea brings us:

Example 1 (Mairal-Yu example)

  • Stage 1: We start with {n=p=1}, {b=[1]} and {A=[1]}. The homotopy path has one kink at {\lambda_1=1} (with corresponding solution {[0]}) and hence, two linear segments. Now let’s go to the next larger instance:
  • Stage 2: We can choose the entry {b_2} as we like and choose it equals to 1, i.e. our new {b} is

    \displaystyle  b = \begin{bmatrix} 1\\1 \end{bmatrix}.

    Now we have to choose {\alpha} according to (1), i.e

    \displaystyle  0 < \alpha < \frac{\lambda_1}{2\|b\|_2^2 + b_{n+1}^2} = \frac{1}{2+1} = \frac{1}{3}

    and we can choose, e.g., {\alpha = 1/4} which gives our new matrix

    \displaystyle  A = \begin{bmatrix} 1 & \frac12\\ 0 & \frac14 \end{bmatrix}.

    The calculation of the new regularization path shows that it has exactly the announced number of 5 segments and the parameter of the smallest kink is {\lambda_1 = \frac{1}{13}}.

  • Stage 2: Again we choose {b_{n+1} = 1} giving

    \displaystyle  b = \begin{bmatrix} 1\\1\\1 \end{bmatrix}

    For the choice of {\alpha} we need that

    \displaystyle  0<\alpha < \frac{1}{13(4+1)} = \frac{1}{75}

    and we may choose

    \displaystyle  \alpha = \frac1{80}.

    which gives the next matrix

    \displaystyle  A = \begin{bmatrix} 1 & \frac12 & \tfrac{1}{40}\\ 0 & \frac14 & \tfrac{1}{40}\\ 0 & 0 & \tfrac{1}{80} \end{bmatrix}.

    We calculate the regularization path, observe that it has the predicted 14 segments and that the parameter of the smallest kink is {\lambda_1 = \frac{1}{193}}.

  • Stage 3: Again we choose {b_{n+1} = 1} giving

    \displaystyle  b = \begin{bmatrix} 1\\1\\1\\1 \end{bmatrix}

    For the choice of {\alpha} we need that

    \displaystyle  0<\alpha < \frac{1}{193(6+1)} = \frac{1}{1351}

    and we see that things are getting awkward here…

Proceeding in this way we always increase the number of linear segments {k_n} for the {n\times n}-case from {k_n} to {k_{n+1} = 3k_n-1} in each step and one checks easily that this leads to {k_n = (3^n+1)/2} which is the worst case! If you are interested in the regularization path: I produced picture for the first three dimensions (well, I could not draw a 4d {\ell^1}-ball) and here they are:

1d Mairal-Yu example


2d Mairal-Yu example


3d Mairal-Yu example

It is not really easy to perceive the whole paths from the pictures because the magnitude of the entries vary strongly. I’ve drawn the path in red, each kink marked with a small circle. Moreover, I have drawn the according {\ell^1}-balls of the respective radii to provide more geometric information.

The paper by Mairal and Yu has more results of the paths if one looks for approximate solutions of the linear system but I will not go into detail about them here.

At least two questions come to mind:

  • The Mairal-Yu example is {n\times n}. What is the worst case complexity for the true rectangular case? In other words: What is the complexity for {p\times n} in terms of {p} and {n}?
  • The example and the construction leads to matrices that does not have normed columns and moreover, they are far from being equal in norm. But matrices with normed columns seem to be more “well behaved”. Does the worst case complexity lowers if the consider matrices with unit-norm columns? Probably one can construct a unit-norm example by proper choice of {b}