November 2011

This term I am teaching “Funktionentheorie”, i.e. functions of a complex variable. It turned out that it would be useful to have the rule of partial integration and a substitution rule for line integrals in this context. Well, these rules hold true and are not difficult to prove, but for some reason they are not treated in the textbooks we used. I state them here for convenience.

Proposition 1 (Partial integration for line integrals of holomorphic functions) Let {f} and {g} be holomorphic functions on a domain {G\subset {\mathbb C}} and let {\gamma:[a,b]\rightarrow G} be continuous and piecewise differentiable. Then it holds that

\displaystyle \begin{array}{rcl} \int_\gamma f(z)g'(z) dz & =& -\int_\gamma f'(z) g(z)dz\\ & & \quad+ \Big[ f(\gamma(b))g(\gamma(b)) - f(\gamma(a))g(\gamma(a))\Big]. \end{array}

Proof: It holds that

\displaystyle (f\,g)'(z) = f'(z)g(z) + f(z) g'(z).

Integration this along {\gamma} and using that {f\, g} is obviously an antiderivative of {(f\,g)'} we obtain

\displaystyle \Big[ f(\gamma(b))g(\gamma(b)) - f(\gamma(a))g(\gamma(a))\Big] = \int_\gamma f'(z)g(z)dz + \int_\gamma f(z)g'(z)dz.


Proposition 2 (Substitution rule for line integrals of functions of a complex variable) Let {\gamma:[a,b]\rightarrow {\mathbb C}} be continuous and piecewise differentiable, {f} be continuous on a neighborhood of {g([a,b])} and let {\phi:{\mathbb C}\rightarrow{\mathbb C}} be bi-holomorphic. Then it holds that

\displaystyle \int_\gamma f(z) dz = \int_{\phi^{-1}\circ \gamma} f(\phi(\zeta))\phi'(\zeta) d\zeta.

Proof: We simply calculate by substituting the parameterization and using the rule for the derivative of the inverse function:

\displaystyle \begin{array}{rcl} \int_{\phi^{-1}\circ \gamma} f(\phi(\zeta))\phi'(\zeta)d\zeta & = & \int_a^b f(\phi(\phi^{-1}(\gamma(t))))\phi'(\phi^{-1}(\gamma(t))) (\phi^{-1}\circ\gamma)'(t)dt\\ & = & \int_a^b f(\gamma(t)) \phi'(\phi^{-1}(\gamma(t))) \frac{\gamma'(t)}{\phi'(\phi^{-1}(\gamma(t)))} dt\\ & = & \int_a^b f(\gamma(t))\gamma'(t)dt\\ & = & \int_\gamma f(z) dz. \end{array}


Remark: It is enough that the transformation {\phi} is biholomorphic from a neighborhood {U} of  {\gamma([a,b])} onto its image {\phi(U)}.

I stumbled upon the notion of “time varying channels” in signal processing and after reading a bit I realized that these are really interesting objects and appear in many different parts of mathematics. In this post I collect of few of their realizations and relations.

In signal processing a “time varying channel” is a mapping which maps a signal {x:{\mathbb R}^d\rightarrow {\mathbb C}} to an output {y:{\mathbb R}^d\rightarrow{\mathbb C}} via

\displaystyle y(t) = \int_{{\mathbb R}^d}\int_{{\mathbb R}^d} s(\tau,\nu)x(t-\tau)\mathrm{e}^{\mathrm{i}\nu t}d\nu d\tau.

Before we explain the name “time varying channel” we fix the notation for the Fourier transform I am going to use in this post:

Definition 1 For {f:{\mathbb R}^d\rightarrow{\mathbb C}} we define the Fourier transform by

\displaystyle \mathcal{F}(f)(\omega) = \hat f(\omega) = (2\pi)^{-d/2}\int f(t)\mathrm{e}^{-\mathrm{i}\omega t}dt

and denote the inverse Fourier transform by {\mathcal{F}^{-1}f} or {\check f}. For a function {f:{\mathbb R}^d\times{\mathbb R}^d\rightarrow{\mathbb C}} we denote with {\mathcal{F}_1} and {\mathcal{F}_2} the Fourier transform with respect to the first and second {{\mathbb R}^d}-component, respectively.

Remark 1 In all what follows we do formal calculations with integral not caring about integrability. All calculation are justified in the case of Schwartz-functions and often hold in a much broader context of tempered distributions (this for example happens if the integrals represent Fourier transforms of functions).

The name “time varying channel” can be explained as follows: Consider a pure frequency as input: {x(t) = \mathrm{e}^{-\mathrm{i}\omega t}}. A usual linear channel gives as output a damped signal and the damping depends on the frequency {\omega}: {y(t) = h(\omega) \mathrm{e}^{-\mathrm{i}\omega t}}. If we send the pure frequency in our time varying channel we get

\displaystyle \begin{array}{rcl} y(t) & = &\int\int s(\tau,\nu) \mathrm{e}^{-\mathrm{i}\omega(t-\tau)}e^{\mathrm{i}\nu t}d\nu dt\\ & =& \int\int s(\tau,\nu) \mathrm{e}^{\mathrm{i}(\omega\tau +\nu t)}d\nu dt\, \mathrm{e}^{-\mathrm{i}\omega t}\\ & =& (2\pi)^d \hat s(-\omega,-t) \mathrm{e}^{-\mathrm{i}\omega t}. \end{array}

Hence, the time varying channel also damps the pure frequencies but with a time dependent factor.

Let’s start quite far away from signal processing:

1. Pseudo-differential operators

A general linear differential operator of order {N} on functions on {{\mathbb R}^d} is defined with multiindex notation as

\displaystyle Af(t) = \sum_{|\alpha|\leq N} \sigma_\alpha(t) D^\alpha f(t)

with coefficient functions {\sigma_\alpha}. Using Fourier inversion we get

\displaystyle D^\alpha f(t) - (2\pi)^{-d/2} \int \hat f(\omega) (\mathrm{i} \omega)\mathrm{e}^{\mathrm{i}\omega t}d\omega

and hence

\displaystyle \begin{array}{rcl} Af(t) & =& \int \Big((2\pi)^{-d/2} \sum_{|\alpha|\leq N} \sigma_\alpha(t)(\mathrm{i} \omega)^\alpha) \hat f(\omega)\mathrm{e}^{\mathrm{i}\omega t}d\omega\\ &=& \int \sigma(\omega,t) \hat f(\omega)\mathrm{e}^{\mathrm{i}\omega t}d\omega \\ & = & K_\sigma f(t). \end{array}

For a general function (usually obeying some restrictions) {\sigma:{\mathbb R}^d\times{\mathbb R}^d\rightarrow{\mathbb C}} we call the corresponding {K_\sigma} a pseudo-differential operator with symbol {\sigma}.

2. As integral operators

By integral operator I mean something like

\displaystyle F_k f(t) = \int k(t,\tau)f(\tau)d\tau.

We plug the definition of the Fourier transform into {K_\sigma} and obtain

\displaystyle \begin{array}{rcl} K_\sigma f(t) &=& \int \sigma(\omega,t) (2\pi)^{-d/2} \int f(\tau)\mathrm{e}^{-\mathrm{i} \omega \tau}d\tau \mathrm{e}^{\mathrm{i} \omega t}d\omega\\ & = & \int \underbrace{(2\pi)^{-d/2} \int \sigma(\omega,t)\mathrm{e}^{-\mathrm{i}\omega(\tau-t)}d\omega}_{=k(t,\tau)} f(\tau)d\tau\\ &=& \int k(t,\tau)f(\tau)d\tau. \end{array}

Using the Fourier transform we can express the relation between {\sigma} and {k} as

\displaystyle k(t,\tau) = (2\pi)^{-d/2}\int \sigma(\omega,\tau)\mathrm{e}^{-\mathrm{i}\omega(\tau-t)}d\omega = \mathcal{F}_1(\sigma(\cdot,t))(\tau-t). \ \ \ \ \ (1)

3. As “time varying convolution”

The convolution of two functions {f} and {g} is defined as

\displaystyle (f*g)(t) = \int f(\tau) g(t-\tau)d\tau

and we write “the convolution with {g}” as an operator {C_g f = f * g}.


\displaystyle h_t(\tau) = (2\pi)^{-d/2}\int \sigma(\omega,t)\mathrm{e}^{\mathrm{i}\omega \tau}d\omega

we deduce from (1)

\displaystyle K_\sigma f(t) = \int f(\tau) h_t(t-\tau)d\tau= (f * h_t)(t) = C_{h_t}f(t).

4. As superposition of time-frequency shifts

Using that iterated Fourier transforms with respect to components give the Fourier transform, i.e. {\mathcal{F}_2\mathcal{F}_1 = \mathcal{F}}, we obtain

\displaystyle \mathcal{F}_1\sigma = \mathcal{F}_2^{-1}\mathcal{F}_2\mathcal{F}_1\sigma = \mathcal{F}_2^{-1}\hat\sigma.

From (1) we get

\displaystyle \begin{array}{rcl} k(t,\tau) &=& \mathcal{F}_2^{-1}\hat\sigma(\tau-t,t)\\ &=& (2\pi)^{-d/2} \int\hat\sigma(\tau-t,\nu)\mathrm{e}^{\mathrm{i} t\nu}d\nu. \end{array}

Before plugging this into {K_\sigma} we define time shifts {T_u} and frequency shifts (or modulations) {M_\nu} as

\displaystyle T_u f(t) = f(t-u),\quad M_\nu f(t) = \mathrm{e}^{\mathrm{i} t\nu}.

With this we get

\displaystyle \begin{array}{rcl} K_\sigma f(t) &=& (2\pi)^{-d/2}\int\int \hat\sigma(\tau-t,\nu)\mathrm{e}^{\mathrm{i} t\nu}f(\tau)d\nu d\tau\\ &=& (2\pi)^{-d/2}\int\int \hat\sigma(u,\nu)\mathrm{e}^{\mathrm{i} t\nu}f(t+u)d\nu du\\ &=& (2\pi)^{-d/2}\int\int\hat\sigma(u,\nu) M_\nu T_{-u} f(t) d\nu du\\ &=& \int\int w(u,\nu) M_\nu T_{-u} f(t)d\nu du = S_wf(t). \end{array}

Hence, {K_\sigma} is also a weighted superposition of {M_\nu T_{-u} f} (time-frequency shifts) with weight {(2\pi)^{-d/2} \hat\sigma(u,\nu)}.

5. Back to time varying channels

Simple substitution brings us back the situation of a time varying channel

\displaystyle \begin{array}{rcl} K_\sigma f(t) &=& (2\pi)^{-d/2} \int\int \hat\sigma(u,\nu)\mathrm{e}^{\mathrm{i} t\nu}f(t+u)d\nu du\\ &=& \int \int s(\tau,\nu) f(t-\tau) \mathrm{e}^{\mathrm{i} t\nu} d\nu d\tau\\ &=& H_s f(t) \end{array}

with {s(\tau,\nu) = (2\pi)^{-d/2} \hat\sigma(-\tau,\nu)}.

6. As superposition of product-convolution operators

Finally, I’d like to illustrate that this kind of operators can be seen as superposition of product convolution operators.

Introducing product operators {P_g f(t) = g(t)f(t)} we define a product-convolution operator as a convolution with a function {h} followed by a multiplication with another function {g}: {P_g C_h f}.

To express {K_\sigma} with product-convolution operators we choose an orthonormal basis which consists of tensor-products of functions {(\phi_n\otimes\psi_k)} and develop {\sigma} into this basis as

\displaystyle \sigma(\omega,t) = \sum_{n,k} a_{n,k} \phi_n(\omega)\psi_k(t).


\displaystyle \begin{array}{rcl} K_\sigma f(t) &=& \sum_{n,k} a_{n,k}\psi_k(t) \int \phi_n(\omega) \hat f(\omega) \mathrm{e}^{\mathrm{i} \omega t}d\omega\\ & = & \sum_{n,k} a_{n,k}\psi_k(t) (2\pi)^{d/2}\mathcal{F}^{-1}(\phi_n\, \hat f)(t)\\ &=& \sum_{n,k} a_{n,k}\psi_k(t) (\check \phi_n * f)(t) \end{array}

and we obtain

\displaystyle K_\sigma f(t) = \sum_{n,k} a_{n,k}P_{\psi_k} C_{\check\phi_n} f (t).

Remark 2 The integral operators of the form {F_k} are indeed general objects as can be seen from the Schwartz kernel theorem. Every reasonable operator mapping Schwartz functions linearly onto the tempered distributions is itself a generalized integral operator.

I tried to capture the various relation between the first five representations time varying channels in a diagram (where I went out of motivation before filling in all fields…):

If you want to have a sparse solution to a linear system of equation and have heard of compressed sensing or sparse reconstruction than you probably know what to do: Get one of the many solvers for Basis Pursuit and be happy.

Basis Pursuit was designed as a convex approximation of the generally intractable problem of finding the sparsest solution (that is, the solution with the smallest number of non-zero entries). By abuse of notation, we define for {x\in\mathbb{R}^n}

\displaystyle \|x\|_0 = \#\{i\ : x_i\neq 0\}.

(Because of {\|x\|_0 = \lim_{p\rightarrow 0}\|x\|_p^p} some people prefer the, probably more correct but also more confusing, notation {\|x\|_0^0}…).

Then, the sparsest solution of {Ax=b} is the solution of

\displaystyle \min_x \|x\|_0,\quad \text{s.t.}\ Ax=b

and Basis Pursuit replaces {\|x\|_0} with “the closest convex proxy”, i.e.

\displaystyle \min_x \|x\|_1,\quad\text{s.t.}\ Ax=b.

The good thing about Basis Pursuit suit is, that is really gives the sparsest solution under appropriate conditions as is widely known nowadays. Here I’d like to present two simple examples in which the Basis Pursuit solution is

  • not even close to the sparsest solution (by norm).
  • not sparse at all.

1. A small bad matrix

We can build a bad matrix for Basis Pursuit, even in the case {2\times 3}: For a small {\epsilon>0} define

\displaystyle A = \begin{bmatrix} \epsilon & 1 & 0\\ \epsilon & 0 & 1 \end{bmatrix}, \quad b = \begin{bmatrix} 1\\1 \end{bmatrix}.

Of course, the sparsest solution is

\displaystyle x_0 = \begin{bmatrix} 1/\epsilon\\ 0\\ 0\end{bmatrix}

while the solution of Basis Pursuit is

\displaystyle x_1 = \begin{bmatrix} 0\\1\\1 \end{bmatrix}.

The summarize: For {\epsilon<1/2}

\displaystyle \|x_0\|_0 = 1 < 2 = \|x_1\|_0,\quad \|x_0\|_1 = 1/\epsilon > 2 = \|x_1\|_1.

(There is also a least squares solution that has three non-zero entries and a one-norm slightly larger than 2.)

Granted, this matrix is stupid. Especially, its first column has a very small norm compared to the others. Ok, let’s construct a matrix with normalized columns.

2. A small bad matrix with normalized columns

Fix an integer {n} and a small {\epsilon>0}. We define a {n\times(n+2)}-matrix

\displaystyle \begin{bmatrix} 1+\epsilon/2 & -1+\epsilon/2 & 1 & 0 & \dots & \dots &0\\ -1+\epsilon/2 & 1+\epsilon/2 & 0 & 1 & \ddots & & 0\\ \epsilon/2 & \epsilon/2 & \vdots & \ddots& \ddots & \ddots& \vdots\\ \vdots & \vdots & \vdots & & \ddots & \ddots& 0\\ \epsilon/2 & \epsilon/2 & 0 & \dots& \dots& 0 & 1 \end{bmatrix}.

Ok, the first two columns do not have norm 1 yet, so we normalize them by multiplying with the right constant

\displaystyle c = \frac{1}{\sqrt{2 + \tfrac{n\epsilon^2}{4}}}

(which is close to {1/\sqrt{2}}) to get

\displaystyle A = \begin{bmatrix} c(1+\epsilon/2) & c(-1+\epsilon/2) & 1 & 0 & \dots & \dots &0\\ c(-1+\epsilon/2) & c(1+\epsilon/2) & 0 & 1 & \ddots & & 0\\ c\epsilon/2 & c\epsilon/2 & \vdots & \ddots& \ddots & \ddots& \vdots\\ \vdots & \vdots & \vdots & & \ddots & \ddots& 0\\ c\epsilon/2 & c\epsilon/2 & 0 & \dots& \dots& 0 & 1 \end{bmatrix}.

Now we take the right hand side

\displaystyle b = \begin{bmatrix} 1\\\vdots\\1 \end{bmatrix}

and see what solutions to {Ax=b} are there.

First, there is the least squares solution {x_{\text{ls}} = A^\dagger b}. This has only non-zero entries, the last {n} entries are slightly smaller than {1} and the first two are between {0} and {1}, hence, {\|x_{\text{ls}}\|_1 \approx n} (in fact, slightly larger).

Second, there is a very sparse solution

\displaystyle x_0 = \frac{1}{\epsilon c} \begin{bmatrix} 1\\ 1\\ 0\\ \vdots\\ 0 \end{bmatrix}.

This has two non-zero entries and a pretty large one-norm: {\|x_0\|_1 = 2/(\epsilon c)}.

Third there is a solution with small one-norm:

\displaystyle x_1 = \begin{bmatrix} 0\\ 0\\ 1\\ \vdots\\ 1 \end{bmatrix}.

We have {n} non-zero entries and {\|x_1\|_1 = n}. You can check that this {x_1} is also the unique Basis Pursuit solution (e.g. by observing that {A^T[1,\dots,1]^T} is an element of {\partial\|x_1\|_1} and that the first two entries in {A^T[1,\dots,1]^T} are strictly smaller than 1 and positive – put differently, the vector {[1,\dots,1]^T} is dual certificate for {x_1}).

To summarize, for {\epsilon < \sqrt{\frac{8}{n^2-n}}} it holds that

\displaystyle \|x_0\|_0 = 2 < n = \|x_1\|_0,\quad \|x_0\|_1 = 2/(c\epsilon) > n = \|x_1\|_1.

The geometric idea behind this matrix is as follows: We take {n} simple normalized columns (the identity part in {A}) which sum up to the right hand side {b}. Then we take two normalized vectors which are almost orthogonal to {b} but have {b} in their span (but one needs huge factors here to obtain {b}).

Well, this matrix looks very artificial and indeed it’s constructed for one special purpose: To show that minimal {\ell^1}-norm solution are not always sparse (even when a sparse solution exists). It’s some kind of a hobby for me to construct instances for sparse reconstruction with extreme properties and I am thinking about a kind of “gallery” of these instances (probably extending the “gallery” command in Matlab).
By the way: if you want to play around with this matrix, here is the code

n = 100;
epsilon = sqrt(8/(n^2-n))+0.1;
c = 1/sqrt(2+n*epsilon^2/4);
A = zeros(n,n+2);
A(1:2,1:2) = ([1 -1;-1,1]+epsilon/2)*c;
A(3:n,1:2) = epsilon/2*c;
A(1:n,3:n+2) = eye(n);
b = ones(n,1);

On 0xDE I learned that Google has a new serivce, called Google scholar citations. You can make a profile and collect all your publications that Google scholar can find. It also calculates several of these statistics about your citations like the h-index.

Using this one realizes a few things quickly:

  • Google scholar is not even close to be a reliable data base for mathematical publications. I find talks, poster, preprints multiple times,…
  • Google scholar is not even close to have a proper database of the “citation network”. Some preprint series (in which I have some preprints) include a list of all paper in that series at the end of each paper. Google scholar takes this as the list of reference and hence, produces wrong results here.
  • Its hard to tell different people with the same name apart.

It seems that one purpose of Google scholar citation is to improve on all three parts by letting the people doing the work (what is not a bad idea). Note that similar services exist for several disciplines. For mathematics there is MathSciNet which is maintained by the Amarican Mathematical Society. They really put effort in their database with respect to all three point above. Moreover, there is the Zentralblatt.

While it is really nice to have a search engine for science which also finds preprints and not only published material, this service makes it even easier to produce stupid things like “impact statistics”, “productivity indices”. On the positive side it is really helpful to see what paper do cite another one (this makes it easier to find the source of some idea or original proofs). On the downside, this service may lead to the impression that research output (if there is such thing) can be measured by numbers in any way. I do not think that this is true. In the end: What does it say that one paper has been cited 50 times and another one five times? Think about it. It does not mean that the first paper has been read more, had spread more interest, had more impact or is even better in any way. It plainly says that the first paper has been cited ten times more. That’s it. There are papers which introduced great ideas in an awkward way and then a follow-up explained the same idea better. Hence, the second one gets cited although the first is equally important… There are weird paper which get cited a lot because everyone does. There are paper which many people did not read but cite because the have heard or read that this paper is the reference for something.

By the way: My profile is here.

If mathematicians study an object they sometimes study the properties of the object. E.g. you can study the object “2” (that is the natural number 2) by saying that it denotes a certain cardinality, name the cardinality of two things. Well, that does not give you anything new. However, mathematicians often study objects by studying what either the object can do to other things or what other things can do the objects. In our (quite silly) example of the number two you see that you can use 2 to add it to other numbers (or, conversely, you can add other numbers to 2). One may feel that this does not add any insight but for objects which are more complicated (e.g. like sets of other objects or sets of these) this view can be particularly useful: If you consider a normed vector space {X}, you can ask either for properties of the space itself (e.g. completeness) or you ask how one can map the objects of the space to other sets. Using the underlying ingredients to give more structure on the mappings under consideration this gives new constructions: In the case of a normed vector space you have the underlying field {K}. This leads to mappings from {X} to {K}. You also have a linear structure on {X}. This leads to linear mappings from {X} to {K}. Finally, there is a norm on {X} and if there is an absolute value on {K} this leads you to bounded linear mappings from {X} to {K}. Any guess what: These objects form the so-called dual space.

Dual spaces of (say) Banach spaces are telling you new things about the space itself and it seems to me that if you want to understand the nature of a Banach space you need to know the dual space (or a space of which your space is the dual, i.e. a pre-dual space).

This post shall not be about duality in general but on dual spaces of a particular type of spaces, namely of spaces which consist of continuous functions. While these spaces usually form nice Banach spaces, they are not as simple as Lebesgue spaces or Sobolev spaces in that there are no Hilbert spaces among them and even no reflexive spaces.

1. What spaces of continuous functions?

For sets {\Omega} and {V} one can study continuous functions {f:\Omega\rightarrow V} as soon as both {\Omega} and {V} are topological spaces. For these functions to form a vector space we also want that {V} is a vector space over a field {K}. To get a normed space of continuous functions we endow {V} with a norm {\|\cdot\|} and try to define a norm of {f:\Omega \rightarrow V} as

\displaystyle \|f\|_\infty = \sup_{x\in\Omega}\|f(x)\|.

But wait: The supremum may not exist. We better add the assumption that {f} is bounded. Now this gives us a normed space and we denote:

\displaystyle C_b(\Omega,V) = \{f:\Omega\rightarrow V\ |\ f\ \text{continuous and bounded}\}.

Together with the norm {\|\cdot\|_\infty} this gives a Banach space as soon as {V} is a Banach space. In the case that {\Omega} is compact, we can omit the condition of boundedness and obtain

\displaystyle C(\Omega,V) = \{f:\Omega\rightarrow V\ |\ f\ \text{continuous}\}

and again get a Banach space as soon as {V} is a Banach space.

Instead of assuming compactness of the whole space {\Omega} we could shift this property to all functions in this space. This leads to

\displaystyle C_c(\Omega,V) = \{f:\Omega\rightarrow V\ |\ f\ \text{continuous with compact support}\}.

This does not lead to a Banach space in the case that {\Omega} is not compact. Indeed, there is a last possibility by considering the closure of {C_c(\Omega,V)} with respect to {\|\cdot\|_\infty} leading to the space

\displaystyle C_0(\Omega,V) = \overline{C_c(\Omega,V)}^{\|\cdot\|_\infty}.

Informally one says that this are the continuous function “that vanish on the boundary of {\Omega}”.

All these spaces admit dual spaces and all these spaces are described by some kind of “Riesz representation theorem”. These dual spaces consists of some kind of measures and the dual pairing is then defined as integrating the continuous functions with respect to these measures. Here I want to clarify (at least for myself) what these dual spaces are and how they are related.

The case in which {V=\mathbb{K}} (i.e. the real or complex numbers) is a little easier and we omit the argument {V} in the spaces in this case: {C_b(\Omega) = C_b(\Omega,\mathbb{K})}, {C(\Omega) = C(\Omega,\mathbb{K})},…

2. The dual of {C_b(\Omega)} for normal {\Omega}

We deal with real or complex valued functions on a set {\Omega} and assume that {\Omega} in a normal topological space (also called {T_4}-space), that is the points in {\Omega} are all closed and two disjoint closed sets can be separated by open neighborhoods. Of course a metric space {\Omega} is a normal topological space.

2.1. Regular bounded finitely additive measures

To see what the dual space of {C_b(\Omega)} could be, we have to think of integrating a bounded continuous functions against a measure {\mu}. Here the topological structure of {\Omega} comes into play because it naturally leads to a {\sigma}-algebra on {\Omega}, namely the one which is generated by the closed sets in {\Omega} (or equivalently be the open sets) which is also called the Borel algebra {B(\Omega)} on {\Omega}. On a {\sigma}-algebra we can define a set function (which shall be a measure) by mapping the elements in {B(\Omega)} to real (or complex) numbers. Such a set function {\mu} is called bounded if {|\mu(E)|<\infty} for all {E\in B(\Omega)} and finitely additive if for any finite number of disjoint set {E_1,\dots,E_n\in B(\Omega)} it holds that

\displaystyle \mu(\cup E_i) = \sum \mu(E_i). \ \ \ \ \ (1)


The set of all bounded and finitely additive set functions (nowadays often called bounded finitely additive measures) is denoted by {ba(\Omega)}. Equipped with the variational norm {|\mu| = |\mu|(\Omega)}, {ba(\Omega)} becomes a Banach space. The space {ba(\Omega)} is a fairly large space of measures: Note that we did neither assume that the member are countably additive nor that the values {\mu(E)} shall be non-negative. However, we assumed the all values {|\mu(E)|} are finite which excludes, e.g. the Lebesgue-measure on {\Omega=\mathbb{R}} (while weighted Lebesgue measures can be allowed if the weight is integrable). The space {ba(\Omega)} is slightly too large to be the dual of {C_b(\Omega)} and we need another restriction. We call an element {\mu\in ba(\Omega)} regular, if for any {E \in B(\Omega)} and any {\epsilon>0} there exists a closed set {F\subset E} and an open set {G\supset E} such that for every {C\subset G\setminus F} it holds that {|\mu(C)|<\epsilon}. The space of regular bounded finitely additive measures is denoted by {rba(\Omega)} and is closed subspace of {ba(\Omega)}

2.2. Intgration with respect to {\mu\in rba(\Omega)}

Now we can define the integral of a bounded continuous function {f\in C_b(\Omega)} with respect to a regular bounded finitely additive measure {\mu\in rba(\Omega)} as follows: Since {f(\Omega)} is a bounded set in {\mathbb{K}} we can cover is with open sets {G_1,\dots, G_n} with diameter less than a given {\epsilon>0}. Define {A_1=G_1} and {A_j = G_j\setminus \bigcup_{i=1}^{j-1} G_i}. If {A_i} is not empty, choose {\alpha_j\in A_j} (otherwise choose {\alpha_j=0}). Since {f} is continuous, the sets {B_j = f^{-1}(A_j)\subset\Omega} are in {B(\Omega)} and we can define

\displaystyle f_\epsilon = \sum_{j=1}^n \alpha_j \chi_{B_j}

which is an {\epsilon}-approximation to {f} and indeed, {f} is the uniform limit of the {f_\epsilon}. For each {f_\epsilon}, the integral is naturally defined as

\displaystyle \int f_\epsilon d\mu = \sum \alpha_j \mu(B_j)

and the limit of this expression is used as integral for {f}.

2.3. Duality

Since it holds that

\displaystyle \Big| \int f d\mu\Big| \leq \|f\|_\infty |\mu|

every {\mu\in rba(\Omega)} defines a continuous linear functional an {C_b(\Omega)}. This shows that the dual space of {C_b(\Omega)} contains {rba(\Omega)}. Indeed both spaces are equal:

Theorem 1 For a normal topological space {\Omega} it holds that

\displaystyle C_b(\Omega)^* = rba(\Omega).

The proof is lengthy and technical and the prime reference is “Linear Operators” by Dunford and Schwartz (Theorem IV.6.2).

3. The dual of {C(\Omega)} for compact Hausdorff {\Omega}

Now we assume that {\Omega} is a compact Hausdorff space (think of closed and bounded subsets of {\mathbb{R}^d}). Due to compactness we can omit the boundedness assumption on our continuous functions (they are bounded anyway).

3.1. Regular bounded countably additive measures

We move from finitely additive measures to countably additive measure that is, (1) holds for countably many disjoint sets {E_i}. Together with finiteness and regularity we arrive at the space {rca(\Omega)} of regular bounded countably additive measures. In our case, where {\Omega} is also a topological space and the {\sigma}-algebra is the Borel algebra this space is also called the space of regular Borel measures} or Radon measure} and often denoted by {\mathfrak{M}(\Omega)}.

3.2. Duality

Here we have the following representation theorem:

Theorem 2 If {\Omega} is compact and Hausdorff it holds that

\displaystyle C(\Omega)^* = \mathfrak{M}(\Omega)\ (=rca(\Omega)).


\displaystyle f\mapsto \int f d\mu

every {\mu\in\mathfrak{M}(\Omega)} determines an element in {C(\Omega)^*}. Moreover, it is clear that {rba(\Omega)\supset C(\Omega)^*}. It remains to show that every {\lambda\in rba(\Omega)} determines a {\mu\in rca(\Omega) = \mathfrak{M}(\Omega)} such that for all {f\in C(\Omega)} it holds that {\int fd\lambda = \int fd\mu}.

4. The dual of {C_0(\Omega)} for locally compact Hausdorff {\Omega}

This case is basically the same as for {C(\Omega)} with compact Hausdorff {\Omega}. The absence of compactness is compensated by the fact that the function “vanish on the boundary”. Well, “vanishing on the boundary” really means, that the function can be approximated by continuous function with compact support (in the {\infty}-norm) and hence, the result do not differ from the previous one:

Theorem 3 If {\Omega} is locally compact and Hausdorff it holds that

\displaystyle C_0(\Omega)^* = \mathfrak{M}(\Omega).

Final remarks:

  • It turned out that the situation is easier than I expected. Basically there are two cases: Bounded continuous functions on a normal set {\Omega}. Here we get the regular bounded and finitely additive measures as the dual space. The case are continuous functions on a compact space or continuous function which “vanish at the boundary” on a locally compact space. In both we arrive at cases regular bounded countably additive measure aka Radon measures.


  • For {\Omega\subset{\mathbb R}^d} the distinction is:
    1. {\Omega} arbitrary and bounded continuous functions give {rba(\Omega)}.
    2. {\Omega} compact and continuous functions give {\mathfrak{M}(\Omega)}.
    3. {\Omega} arbitrary and continuous functions vanishing on {\partial\Omega} also give {\mathfrak{M}(\Omega)}.


  • There are a lot of excellent references for the Riesz representation theorem on the web and you’ll easily find several proofs. However, the proof is always technical and lengthy. One attempt of a short proof in the AMM-article “The Riesz Representation Theorem Revisited”was successful, however uses several heavy tools: Stone-Cech compactification, Caratheodory extension and a result about clopen sets in extremally disconnected spaces. 


How many samples are needed to reconstruct a sparse signal?

Well, there are many, many results around some of which you probably know (at least if you are following this blog or this one). Today I write about a neat result which I found quite some time ago on reconstruction of nonnegative sparse signals from a semi-continuous perspective.

1. From discrete sparse reconstruction/compressed sensing to semi-continuous

The basic sparse reconstruction problem asks the following: Say we have a vector {x\in{\mathbb R}^m} which only has {s<m} non-zero entries and a fat matrix {A\in{\mathbb R}^{n\times m}} (i.e. {n>m}) and consider that we are given measurements {b=Ax}. Of course, the system {Ax=b} is underdetermined. However, we may add a little more prior knowledge on the solution and ask: Is is possible to reconstruct {x} from {b} if we know that the vector {x} is sparse? If yes: How? Under what conditions on {m}, {s}, {n} and {A}? This question created the expanding universe of compressed sensing recently (and this universe is expanding so fast that for sure there has to be some dark energy in it). As a matter of fact, a powerful method to obtain sparse solutions to underdetermined systems is {\ell^1}-minimization a.k.a. Basis Pursuit on which I blogged recently: Solve

\displaystyle  \min_x \|x\|_1\ \text{s.t.}\ Ax=b

and the important ingredient here is the {\ell^1}-norm of the vector in the objective function.

In this post I’ll formulate semi-continuous sparse reconstruction. We move from an {m}-vector {x} to a finite signed measure {\mu} on a closed interval (which we assume to be {I=[-1,1]} for simplicty). We may embed the {m}-vectors into the space of finite signed measures by choosing {m} points {t_i}, {i=1,\dots, m} from the interval {I} and build {\mu = \sum_{i=1}^m x_i \delta_{t_i}} with the point-masses (or Dirac measures) {\delta_{t_i}}. To a be a bit more precise, we speak about the space {\mathfrak{M}} of Radon measures on {I}, which are defined on the Borel {\sigma}-algebra of {I} and are finite. Radon measures are not very scary objects and an intuitive way to think of them is to use Riesz representation: Every Radon measure arises as a continuous linear functional on a space of continuous functions, namely the space {C_0(I)} which is the closure of the continuous functions with compact support in {{]{-1,1}[}} with respect to the supremum norm. Hence, Radon measures work on these functions as {\int_I fd\mu}. It is also natural to speak of the support {\text{supp}(\mu)} of a Radon measure {\mu} and it holds for any continuous function {f} that

\displaystyle  \int_I f d\mu = \int_{\text{supp}(\mu)}f d\mu.

An important tool for Radon measures is the Hahn-Jordan decomposition which decomposes {\mu} into a positive part {\mu^+} and a negative part {\mu^-}, i.e. {\mu^+} and {\mu^-} are non-negative and {\mu = \mu^+-\mu^-}. Finally the variation of a measure, which is

\displaystyle  \|\mu\| = \mu^+(I) + \mu^-(I)

provides a norm on the space of Radon measures.

Example 1 For the measure {\mu = \sum_{i=1}^m x_i \delta_{t_i}} one readily calculates that

\displaystyle  \mu^+ = \sum_i \max(0,x_i)\delta_{t_i},\quad \mu^- = \sum_i \max(0,-x_i)\delta_{t_i}

and hence

\displaystyle  \|\mu\| = \sum_i |x_i| = \|x\|_1.

In this sense, the space of Radon measures provides a generalization of {\ell^1}.

We may sample a Radon measure {\mu} with {n+1} linear functionals and these can be encoded by {n+1} continuous functions {u_0,\dots,u_n} as

\displaystyle  b_k = \int_I u_k d\mu.

This sampling gives a bounded linear operator {K:\mathfrak{M}\rightarrow {\mathbb R}^{n+1}}. The generalization of Basis Pursuit is then given by

\displaystyle  \min_{\mu\in\mathfrak{M}} \|\mu\|\ \text{s.t.}\ K\mu = b.

This was introduced and called “Support Pursuit” in the preprint Exact Reconstruction using Support Pursuit by Yohann de Castro and Frabrice Gamboa.

More on the motivation and the use of Radon measures for sparsity can be found in Inverse problems in spaces of measures by Kristian Bredies and Hanna Pikkarainen.

2. Exact reconstruction of sparse nonnegative Radon measures

Before I talk about the results we may count the degrees of freedom a sparse Radon measure has: If {\mu = \sum_{i=1}^s x_i \delta_{t_i}} with some {s} than {\mu} is defined by the {s} weights {x_i} and the {s} positions {t_i}. Hence, we expect that at least {2s} linear measurements should be necessary to reconstruct {\mu}. Surprisingly, this is almost enough if we know that the measure is nonnegative! We only need one more measurement, that is {2s+1} and moreover, we can take fairly simple measurements, namely the monomials: {u_i(t) = t^i} {i=0,\dots, n} (with the convention that {u_0(t)\equiv 1}). This is shown in the following theorem by de Castro and Gamboa.

Theorem 1 Let {\mu = \sum_{i=1}^s x_i\delta_{t_i}} with {x_i\geq 0}, {n=2s} and let {u_i}, {i=0,\dots n} be the monomials as above. Define {b_i = \int_I u_i(t)d\mu}. Then {\mu} is the unique solution of the support pursuit problem, that is of

\displaystyle  \min \|\nu\|\ \text{s.t.}\ K\nu = b.\qquad \textup{(SP)}

Proof: The following polynomial will be of importance: For a constant {c>0} define

\displaystyle  P(t) = 1 - c \prod_{i=1}^s (t-t_i)^2.

The following properties of {P} will be used:

  1. {P(t_i) = 1} for {i=1,\dots,s}
  2. {P} has degree {n=2s} and hence, is a linear combination of the {u_i}, {i=0,\dots,n}, i.e. {P = \sum_{k=0}^n a_k u_k}.
  3. For {c} small enough it holds for {t\neq t_i} that {|P(t)|<1}.

Now let {\sigma} be a solution of (SP). We have to show that {\|\mu\|\leq \|\sigma\|}. Due to property 2 we know that

\displaystyle  \int_I u_k d\sigma = (K\sigma)k = b_k = \int_I u_k d\mu.

Due to property 1 and non-negativity of {\mu} we conclude that

\displaystyle  \begin{array}{rcl}  \|\mu\| & = & \sum_{i=1}^s x_i = \int_I P d\mu\\ & = & \int_I \sum_{k=0}^n a_k u_k d\mu\\ & = & \sum_{k=0}^n a_k \int_I u_k d\mu\\ & = & \sum_{k=0}^n a_k \int_I u_k d\sigma\\ & = & \int_I P d\sigma. \end{array}

Moreover, by Lebesgue’s decomposition we can decompose {\sigma} with respect to {\mu} such that

\displaystyle  \sigma = \underbrace{\sum_{i=1}^s y_i\delta_{t_i}}_{=\sigma_1} + \sigma_2

and {\sigma_2} is singular with respect to {\mu}. We get

\displaystyle  \begin{array}{rcl}  \int_I P d\sigma = \sum_{i=1}^s y_i + \int P d\sigma_2 \leq \|\sigma_1\| + \|\sigma_2\|=\|\sigma\| \end{array}

and we conclude that {\|\sigma\| = \|\mu\|} and especially {\int_I P d\sigma_2 = \|\sigma_2\|}. This shows that {\mu} is a solution to {(SP)}. It remains to show uniqueness. We show the following: If there is a {\nu\in\mathfrak{M}} with support in {I\setminus\{x_1,\dots,x_s\}} such that {\int_I Pd\nu = \|\nu\|}, then {\nu=0}. To see this, we build, for any {r>0}, the sets

\displaystyle  \Omega_r = [-1,1]\setminus \bigcup_{i=1}^s ]x_i-r,x_i+r[.

and assume that there exists {r>0} such that {\|\nu|_{\Omega_r}\|\neq 0} ({\nu|_{\Omega_r}} denoting the restriction of {\nu} to {\Omega_r}). However, it holds by property 3 of {P} that

\displaystyle  \int_{\Omega_r} P d\nu < \|\nu|_{\Omega_r}\|

and consequently

\displaystyle  \begin{array}{rcl}  \|\nu\| &=& \int Pd\nu = \int_{\Omega_r} Pd\nu + \int_{\Omega_r^C} P d\nu\\ &<& \|\nu|_{\Omega_r}\| + \|\nu|_{\Omega_r^C}\| = \|\nu\| \end{array}

which is a contradiction. Hence, {\nu|_{\Omega_r}=0} for all {r} and this implies {\nu=0}. Since {\sigma_2} has its support in {I\setminus\{x_1,\dots,x_s\}} we conclude that {\sigma_2=0}. Hence the support of {\sigma} is exactly {\{x_1,\dots,x_s\}}. and since {K\sigma = b = K\mu} and hence {K(\sigma-\mu) = 0}. This can be written as a Vandermonde system

\displaystyle  \begin{pmatrix} u_0(t_1)& \dots &u_0(t_s)\\ \vdots & & \vdots\\ u_n(t_1)& \dots & u(t_s) \end{pmatrix} \begin{pmatrix} y_1 - x_1\\ \vdots\\ y_s - x_s \end{pmatrix} = 0

which only has the zero solution, giving {y_i=x_i}. \Box

3. Generalization to other measurements

The measurement by monomials may sound a bit unusual. However, de Castro and Gamboa show more. What really matters here is that the monomials for a so-called Chebyshev-System (or Tchebyscheff-system or T-system – by the way, have you ever tried to google for a T-system?). This is explained, for example in the book “Tchebycheff Systems: With Applications in Analysis and Statistics” by Karlin and Studden. A T-system on {I} is simply a set of {n+1} functions {\{u_0,\dots, u_n\}} such that any linear combination of these functions has at most {n} zeros. These systems are called after Tchebyscheff since they obey many of the helpful properties of the Tchebyscheff-polynomials.

What is helpful in our context is the following theorem of Krein:

Theorem 2 (Krein) If {\{u_0,\dots,u_n\}} is a T-system for {I}, {k\leq n/2} and {t_1,\dots,t_k} are in the interior of {I}, then there exists a linear combination {\sum_{k=0}^n a_k u_k} which is non-negative and vanishes exactly the the point {t_i}.

Now consider that we replace the monomials in Theorem~1 by a T-system. You recognize that Krein’s Theorem allows to construct a “generalized polynomial” which fulfills the same requirements than the polynomial {P} is the proof of Theorem~1 as soon as the constant function 1 lies in the span of the T-system and indeed the result of Theorem~1 is also valid in that case.

4. Exact reconstruction of {s}-sparse nonnegative vectors from {2s+1} measurements

From the above one can deduce a reconstruction result for {s}-sparse vectors and I quote Theorem 2.4 from Exact Reconstruction using Support Pursuit:

Theorem 3 Let {n}, {m}, {s} be integers such that {s\leq \min(n/2,m)} and let {\{1,u_1,\dots,u_n\}} be a complete T-system on {I} (that is, {\{1,u_1,\dots,u_r\}} is a T-system on {I} for all {r<n}). Then it holds: For any distinct reals {t_1,\dots,t_m} and {A} defined as

\displaystyle  A=\begin{pmatrix} 1 & \dots & 1\\ u_1(t_1)& \dots &u_1(t_m)\\ \vdots & & \vdots\\ u_n(t_1)& \dots & u(t_m) \end{pmatrix}

Basis Pursuit recovers all nonnegative {s}-sparse vectors {x\in{\mathbb R}^m}.

5. Concluding remarks

Note that Theorem~3 gives a deterministic construction of a measurement matrix.

Also note, that nonnegativity is crucial in what we did here. This allowed (in the monomial case) to work with squares and obtain the polynomial {P} in the proof of Theorem~1 (which is also called “dual certificate” in this context). This raises the question how this method can be adapted to all sparse signals. One needs (in the monomial case) a polynomial which is bounded by 1 but matches the signs of the measure on its support. While this can be done (I think) for polynomials it seems difficult to obtain a generalization of Krein’s Theorem to this case…