How many samples are needed to reconstruct a sparse signal?

Well, there are many, many results around some of which you probably know (at least if you are following this blog or this one). Today I write about a neat result which I found quite some time ago on reconstruction of nonnegative sparse signals from a semi-continuous perspective.

1. From discrete sparse reconstruction/compressed sensing to semi-continuous

The basic sparse reconstruction problem asks the following: Say we have a vector {x\in{\mathbb R}^m} which only has {s<m} non-zero entries and a fat matrix {A\in{\mathbb R}^{n\times m}} (i.e. {n>m}) and consider that we are given measurements {b=Ax}. Of course, the system {Ax=b} is underdetermined. However, we may add a little more prior knowledge on the solution and ask: Is is possible to reconstruct {x} from {b} if we know that the vector {x} is sparse? If yes: How? Under what conditions on {m}, {s}, {n} and {A}? This question created the expanding universe of compressed sensing recently (and this universe is expanding so fast that for sure there has to be some dark energy in it). As a matter of fact, a powerful method to obtain sparse solutions to underdetermined systems is {\ell^1}-minimization a.k.a. Basis Pursuit on which I blogged recently: Solve

\displaystyle  \min_x \|x\|_1\ \text{s.t.}\ Ax=b

and the important ingredient here is the {\ell^1}-norm of the vector in the objective function.

In this post I’ll formulate semi-continuous sparse reconstruction. We move from an {m}-vector {x} to a finite signed measure {\mu} on a closed interval (which we assume to be {I=[-1,1]} for simplicty). We may embed the {m}-vectors into the space of finite signed measures by choosing {m} points {t_i}, {i=1,\dots, m} from the interval {I} and build {\mu = \sum_{i=1}^m x_i \delta_{t_i}} with the point-masses (or Dirac measures) {\delta_{t_i}}. To a be a bit more precise, we speak about the space {\mathfrak{M}} of Radon measures on {I}, which are defined on the Borel {\sigma}-algebra of {I} and are finite. Radon measures are not very scary objects and an intuitive way to think of them is to use Riesz representation: Every Radon measure arises as a continuous linear functional on a space of continuous functions, namely the space {C_0(I)} which is the closure of the continuous functions with compact support in {{]{-1,1}[}} with respect to the supremum norm. Hence, Radon measures work on these functions as {\int_I fd\mu}. It is also natural to speak of the support {\text{supp}(\mu)} of a Radon measure {\mu} and it holds for any continuous function {f} that

\displaystyle  \int_I f d\mu = \int_{\text{supp}(\mu)}f d\mu.

An important tool for Radon measures is the Hahn-Jordan decomposition which decomposes {\mu} into a positive part {\mu^+} and a negative part {\mu^-}, i.e. {\mu^+} and {\mu^-} are non-negative and {\mu = \mu^+-\mu^-}. Finally the variation of a measure, which is

\displaystyle  \|\mu\| = \mu^+(I) + \mu^-(I)

provides a norm on the space of Radon measures.

Example 1 For the measure {\mu = \sum_{i=1}^m x_i \delta_{t_i}} one readily calculates that

\displaystyle  \mu^+ = \sum_i \max(0,x_i)\delta_{t_i},\quad \mu^- = \sum_i \max(0,-x_i)\delta_{t_i}

and hence

\displaystyle  \|\mu\| = \sum_i |x_i| = \|x\|_1.

In this sense, the space of Radon measures provides a generalization of {\ell^1}.

We may sample a Radon measure {\mu} with {n+1} linear functionals and these can be encoded by {n+1} continuous functions {u_0,\dots,u_n} as

\displaystyle  b_k = \int_I u_k d\mu.

This sampling gives a bounded linear operator {K:\mathfrak{M}\rightarrow {\mathbb R}^{n+1}}. The generalization of Basis Pursuit is then given by

\displaystyle  \min_{\mu\in\mathfrak{M}} \|\mu\|\ \text{s.t.}\ K\mu = b.

This was introduced and called “Support Pursuit” in the preprint Exact Reconstruction using Support Pursuit by Yohann de Castro and Frabrice Gamboa.

More on the motivation and the use of Radon measures for sparsity can be found in Inverse problems in spaces of measures by Kristian Bredies and Hanna Pikkarainen.

2. Exact reconstruction of sparse nonnegative Radon measures

Before I talk about the results we may count the degrees of freedom a sparse Radon measure has: If {\mu = \sum_{i=1}^s x_i \delta_{t_i}} with some {s} than {\mu} is defined by the {s} weights {x_i} and the {s} positions {t_i}. Hence, we expect that at least {2s} linear measurements should be necessary to reconstruct {\mu}. Surprisingly, this is almost enough if we know that the measure is nonnegative! We only need one more measurement, that is {2s+1} and moreover, we can take fairly simple measurements, namely the monomials: {u_i(t) = t^i} {i=0,\dots, n} (with the convention that {u_0(t)\equiv 1}). This is shown in the following theorem by de Castro and Gamboa.

Theorem 1 Let {\mu = \sum_{i=1}^s x_i\delta_{t_i}} with {x_i\geq 0}, {n=2s} and let {u_i}, {i=0,\dots n} be the monomials as above. Define {b_i = \int_I u_i(t)d\mu}. Then {\mu} is the unique solution of the support pursuit problem, that is of

\displaystyle  \min \|\nu\|\ \text{s.t.}\ K\nu = b.\qquad \textup{(SP)}

Proof: The following polynomial will be of importance: For a constant {c>0} define

\displaystyle  P(t) = 1 - c \prod_{i=1}^s (t-t_i)^2.

The following properties of {P} will be used:

  1. {P(t_i) = 1} for {i=1,\dots,s}
  2. {P} has degree {n=2s} and hence, is a linear combination of the {u_i}, {i=0,\dots,n}, i.e. {P = \sum_{k=0}^n a_k u_k}.
  3. For {c} small enough it holds for {t\neq t_i} that {|P(t)|<1}.

Now let {\sigma} be a solution of (SP). We have to show that {\|\mu\|\leq \|\sigma\|}. Due to property 2 we know that

\displaystyle  \int_I u_k d\sigma = (K\sigma)k = b_k = \int_I u_k d\mu.

Due to property 1 and non-negativity of {\mu} we conclude that

\displaystyle  \begin{array}{rcl}  \|\mu\| & = & \sum_{i=1}^s x_i = \int_I P d\mu\\ & = & \int_I \sum_{k=0}^n a_k u_k d\mu\\ & = & \sum_{k=0}^n a_k \int_I u_k d\mu\\ & = & \sum_{k=0}^n a_k \int_I u_k d\sigma\\ & = & \int_I P d\sigma. \end{array}

Moreover, by Lebesgue’s decomposition we can decompose {\sigma} with respect to {\mu} such that

\displaystyle  \sigma = \underbrace{\sum_{i=1}^s y_i\delta_{t_i}}_{=\sigma_1} + \sigma_2

and {\sigma_2} is singular with respect to {\mu}. We get

\displaystyle  \begin{array}{rcl}  \int_I P d\sigma = \sum_{i=1}^s y_i + \int P d\sigma_2 \leq \|\sigma_1\| + \|\sigma_2\|=\|\sigma\| \end{array}

and we conclude that {\|\sigma\| = \|\mu\|} and especially {\int_I P d\sigma_2 = \|\sigma_2\|}. This shows that {\mu} is a solution to {(SP)}. It remains to show uniqueness. We show the following: If there is a {\nu\in\mathfrak{M}} with support in {I\setminus\{x_1,\dots,x_s\}} such that {\int_I Pd\nu = \|\nu\|}, then {\nu=0}. To see this, we build, for any {r>0}, the sets

\displaystyle  \Omega_r = [-1,1]\setminus \bigcup_{i=1}^s ]x_i-r,x_i+r[.

and assume that there exists {r>0} such that {\|\nu|_{\Omega_r}\|\neq 0} ({\nu|_{\Omega_r}} denoting the restriction of {\nu} to {\Omega_r}). However, it holds by property 3 of {P} that

\displaystyle  \int_{\Omega_r} P d\nu < \|\nu|_{\Omega_r}\|

and consequently

\displaystyle  \begin{array}{rcl}  \|\nu\| &=& \int Pd\nu = \int_{\Omega_r} Pd\nu + \int_{\Omega_r^C} P d\nu\\ &<& \|\nu|_{\Omega_r}\| + \|\nu|_{\Omega_r^C}\| = \|\nu\| \end{array}

which is a contradiction. Hence, {\nu|_{\Omega_r}=0} for all {r} and this implies {\nu=0}. Since {\sigma_2} has its support in {I\setminus\{x_1,\dots,x_s\}} we conclude that {\sigma_2=0}. Hence the support of {\sigma} is exactly {\{x_1,\dots,x_s\}}. and since {K\sigma = b = K\mu} and hence {K(\sigma-\mu) = 0}. This can be written as a Vandermonde system

\displaystyle  \begin{pmatrix} u_0(t_1)& \dots &u_0(t_s)\\ \vdots & & \vdots\\ u_n(t_1)& \dots & u(t_s) \end{pmatrix} \begin{pmatrix} y_1 - x_1\\ \vdots\\ y_s - x_s \end{pmatrix} = 0

which only has the zero solution, giving {y_i=x_i}. \Box

3. Generalization to other measurements

The measurement by monomials may sound a bit unusual. However, de Castro and Gamboa show more. What really matters here is that the monomials for a so-called Chebyshev-System (or Tchebyscheff-system or T-system – by the way, have you ever tried to google for a T-system?). This is explained, for example in the book “Tchebycheff Systems: With Applications in Analysis and Statistics” by Karlin and Studden. A T-system on {I} is simply a set of {n+1} functions {\{u_0,\dots, u_n\}} such that any linear combination of these functions has at most {n} zeros. These systems are called after Tchebyscheff since they obey many of the helpful properties of the Tchebyscheff-polynomials.

What is helpful in our context is the following theorem of Krein:

Theorem 2 (Krein) If {\{u_0,\dots,u_n\}} is a T-system for {I}, {k\leq n/2} and {t_1,\dots,t_k} are in the interior of {I}, then there exists a linear combination {\sum_{k=0}^n a_k u_k} which is non-negative and vanishes exactly the the point {t_i}.

Now consider that we replace the monomials in Theorem~1 by a T-system. You recognize that Krein’s Theorem allows to construct a “generalized polynomial” which fulfills the same requirements than the polynomial {P} is the proof of Theorem~1 as soon as the constant function 1 lies in the span of the T-system and indeed the result of Theorem~1 is also valid in that case.

4. Exact reconstruction of {s}-sparse nonnegative vectors from {2s+1} measurements

From the above one can deduce a reconstruction result for {s}-sparse vectors and I quote Theorem 2.4 from Exact Reconstruction using Support Pursuit:

Theorem 3 Let {n}, {m}, {s} be integers such that {s\leq \min(n/2,m)} and let {\{1,u_1,\dots,u_n\}} be a complete T-system on {I} (that is, {\{1,u_1,\dots,u_r\}} is a T-system on {I} for all {r<n}). Then it holds: For any distinct reals {t_1,\dots,t_m} and {A} defined as

\displaystyle  A=\begin{pmatrix} 1 & \dots & 1\\ u_1(t_1)& \dots &u_1(t_m)\\ \vdots & & \vdots\\ u_n(t_1)& \dots & u(t_m) \end{pmatrix}

Basis Pursuit recovers all nonnegative {s}-sparse vectors {x\in{\mathbb R}^m}.

5. Concluding remarks

Note that Theorem~3 gives a deterministic construction of a measurement matrix.

Also note, that nonnegativity is crucial in what we did here. This allowed (in the monomial case) to work with squares and obtain the polynomial {P} in the proof of Theorem~1 (which is also called “dual certificate” in this context). This raises the question how this method can be adapted to all sparse signals. One needs (in the monomial case) a polynomial which is bounded by 1 but matches the signs of the measure on its support. While this can be done (I think) for polynomials it seems difficult to obtain a generalization of Krein’s Theorem to this case…