### September 2011

This last post on uncertainty principles will be probably the hardest one for me. As said in my first post, I supervised a Master’s thesis and posed the very vague question

“Why are the uncertainty principles for the windowed Fourier transform and the wavelet transform so different?”

I had different things in mind:

• The windowed Fourier transform can be generalized to arbitrary dimensions easily. Especially, the underlying Weyl-Heisenberg group can be generalized to arbitrary dimensions. Interestingly, the uncertainty principle carries over almost exactly: For the windowed Fourier transform in ${d}$ dimensions, the uncertainty principle reads as

$\displaystyle \text{Var}_g\cdot\text{Var}_{\hat g}\geq \tfrac{d^2}{4}$

and again, this inequality is sharp for the multivariate Gaussians. A generalization of the wavelet transform is by no means canonical. The sprit in one dimension was to use translation and scaling. However, in higher dimensions there are a lot more geometric transformations you can apply: rotations, anisotropic scalings and shearing. Here one has to identify a suitable group of actions and try to carry all things over. The most naive way, which uses isotropic scaling and rotation does lead to uncertainty relations but no function will make these inequalities sharp…

• The lower bound in the Heisenberg uncertainty principle is fixed (for normed ${g}$). However, the lower bound in the affine uncertainty (equation (1) in my previous post) is not fixed (for normed ${f}$). Indeed ${|\langle f',f\rangle|}$ can be arbitrarily small. Hence, a function which makes the inequality sharp may not lead to the minimum product of the corresponding operator variances. For other wavelet-like transformations (i.e. they include some kind of scaling) this is the same.
• The Heisenberg uncertainty principle has a clear and crisp interpretation involving the product of the variances for a function and its Fourier transform. There is no such thing available for the affine uncertainty principle. (In fact, this question was not addressed in the thesis but in the paper “Do uncertainty minimizers attain minimal uncertainty” and the Diploma thesis by Bastian Kanning).

The outcome was the (german) thesis Unschärferelationen für unitäre Gruppendarstellungen (Uncertainty relations for unitary group representations) by Melanie Rehders. As the question is so vague, there could not be one simple answer, but as a result of the thesis, one could say in a nutshell:

“The uncertainty principles are so different because the groups underlying wavelet-like transforms are semidirect products of a matrix group and ${\mathbb{R}^d}$ and hence, the identity can not be an infinitesimal generator and hence, not be a commutator”.

In this post I’ll face the challenge to give some meaning to this sentence.

1. The abstract structure behind

Let me introduce the players in a diagram which I redraw from the thesis:

As you see, we need several algebraic structures (as well as analytical ones).

2. From group representations to integral transforms

First, we need a locally compact group ${G}$, and naturally, this comes with a left invariant measure ${\mu}$, which is called Haar measure. With these tool we can intergrate complex valued functions defined of the group: ${\int_G f(x)d\mu}$ and we may also form the spaces ${L^p(G)}$.

Having the space ${L^2(G)}$, we can define a special representation of the group (remember that a group representation is a description of the group in terms of linear transformations of a vector space, in other words, a group homomorphism from ${G}$ to the space ${GL(V)}$ of linear mappings on some vector space ${V}$). The special representation we use the the so called left regular representation on the space of unitary operators on the space ${L^2(G)}$ (denoted by ${\mathcal{U}(L^2(G))}$. This representation is the mapping ${\pi:G\rightarrow \mathcal{U}(L^2(G))}$ defined by

$\displaystyle \pi(a) f(x) = f(a^{-1}x).$

One easily checks, that this is a homomorphism and the unitarity follows from the left invariance of the Haar measure. One could say, that the group ${G}$ acts on the functions ${f}$ in ${L^2(G)}$ in a unitary way. We now may define an integral transform as follows: For ${\psi\in L^2(G)}$ define

$\displaystyle V_\psi f(a) = \langle f,\pi(a)\psi\rangle_{L^2(G)}. \ \ \ \ \ (1)$

You may compare with the previous two posts, that this gives precisely the windowed fourier transform (for the Weyl-Heisenberg group) and the wavelet transform (for the affine group).

To have convenient properties for the integral transform one need some more conditions

1. Irreducibility, i.e. that the only subspaces of ${L^2(G)}$ which are invariant under every ${\pi(a)}$ are ${\{0\}}$ and ${L^2(G)}$.
2. Square integrability, i.e. that there exists a non-zero function ${\psi\in L^2(G)}$ such that

$\displaystyle \int_G |\langle \psi,\pi(a)\psi\rangle|^2 d\mu < \infty;$

these functions ${\psi}$ are called admissible.

We have the following theorem: \href{Grossmann, Morlet, Paul}} Let ${\pi}$ be a unitary, irreducible, and square integrable representation of a locally compact group ${G}$ on ${L^2(G)}$ and let ${\psi}$ be admissible. Then it holds that the mapping ${V_\psi}$ defined in (1) is a multiple of an isometry. Especially, ${V_\psi}$ has a left-inverse which is (up to a constant) given by its adjoint.

This somehow clarifies the arrow from “group representation” to “integral transform”.

3. From group representations to Lie algebra representations

For a closed linear group ${G}$, i.e. a closed subgroup of ${GL(d,\mathbb{R})}$, one has the associated Lie-Algebra ${\mathfrak{g}}$ defined with the help of the matrix exponential by

$\displaystyle \mathfrak{g} = \{ X\ |\ \text{for all}\ t:\ \exp(tX)\in G\}.$

The corresponding Lie-bracket is the commutator:

$\displaystyle [X,Y] = XY-YX.$

If we now have a representation of our group ${G}$ on some Hilbert space ${H}$ (you may think of ${H = L^2(G)}$ but here we may have any Hilbert space), we may ask if there is an associated representation of the Lie-Algebra ${\mathfrak{g}}$. Indeed there is one which is called the derived representation. To formulate this representation we need the following subspace of ${H}$:

$\displaystyle H_\pi^\infty = \{f\in H\ |\ a\mapsto \pi(a)f\ \text{is a}\ C^\infty\ \text{mapping}\}.$

Theorem 1 Let ${\pi}$ be a representation of a closed linear group ${G}$ in a Hilbert space ${H}$. The mapping ${d\pi}$ defined by

$\displaystyle d\pi(X)f = \lim_{t\rightarrow 0}\frac{\pi(\exp(tX))f - f}{t}$

is a representation of the Lie-Algebra ${\mathfrak{g}}$ on the space ${H_\pi^\infty}$.

This clarifies the arrow from “group representations” to Lie algebra representations.

4. Lie-algebra representations and uncertainty relations

We are now ready to illustrate the abstract path from Lie-algebra representations to uncertainty relations. This path uses the so called infinitesimal generators:

Definition 2 Let ${G}$ be a closed linear group with Lie algebra ${\mathfrak{g}}$ and let ${{X_1,\dots,X_m}}$ be a basis of ${\mathfrak{g}}$. Let ${\pi}$ be a representation of ${G}$ on a complex Hilbert space ${H}$ and let the derived representation ${d\pi}$ be injective. Then, the operators ${T_j= \mathrm{i} d\pi(X_j)}$ are called the infinitesimal generators of ${G}$ with respect to the representation ${\pi}$.

These infinitesimal generators are always self-adjoint. Hence, we may apply Robertson’s uncertainty principle for every two infinitesimal generators for which the commutator does not vanish.

The abstract way, described in the Sections 2, 3 and 4 is precisely how we have derived the Heisenberg uncertainty principle and the affine uncertainty principle in the two previous posts. But now the question remains: Why are they so different?

The so-called commutator tables of the Lie-algebras shed some light on this:

Example 1 (The Heisenberg algebra) The associated Lie algebra to the Weyl-Heisenberg group is the real vector space ${{\mathbb R}^2 \times \mathrm{i} {\mathbb R}}$ with the Lie bracket

$\displaystyle [(\omega,b,i\phi),(\omega',b',\mathrm{i}\phi')] = (0,0,\mathrm{i}(\omega b' - b'\omega)).$

A basis of this Lie algebra is ${(1,0,0)}$, ${(0,1,0)}$, ${(0,0,\mathrm{i})}$ and the three commutators are

$\displaystyle [(1,0,0),(0,1,0)] = (0,0,2\mathrm{i})$

$\displaystyle [(1,0,0),(0,0,\mathrm{i})]=(0,0,0)$

$\displaystyle [(0,1,0),(0,0,\mathrm{i})] = (0,0,0).$

Two facts are important: There is an element which commutes with every other element. In other words: The center of the algebra is one-dimensional and spanned by one of the basis elements. If we remember the three infinitesimal generators ${\mathrm{i} T_\omega}$, ${\mathrm{i} T_b}$ and ${\mathrm{i} T_\tau}$ for the windowed Fourier transform, we observe that they obey the same commutator relations (which is not a surprise…).

Example 2 (The “affine Lie algebra”) The Lie algebra of the affine group ${({\mathbb R}\setminus\{0\})\times {\mathbb R}}$ (with composition ${(a,b)(a',b') = (aa',ab'+b)}$) is ${{\mathbb R}\times {\mathbb R}}$ with Lie bracket

$\displaystyle [(x,y),(x',y')] = (0,xy'-x'y).$

A basis of the Lie algebra is ${(1,0)}$, ${(0,1)}$ and the commutator is

$\displaystyle [(1,0),(0,1)] = (0,1).$

Here, there is no element which commutes with everything, i.e. the center of the Lie algebra is trivial. Of course, the commutator relation resembles the one for the infinitesimal generators ${\mathrm{i} T_a}$ and ${\mathrm{i} T_b}$ for the wavelet transform.

5. Higher dimensional wavelets

Wavelets in higher dimensions are a bit tricky. If one thinks of groups acting on ${{\mathbb R}^d}$ which consist of translation and some thing as dilation one observes that one basically deals with semidirect products of a subgroup ${D}$ of ${GL(d,{\mathbb R})}$ and ${{\mathbb R}^d}$: For ${A\in D}$ and ${b\in{\mathbb R}^d}$ one may transform a function ${f:{\mathbb R}^d\rightarrow{\mathbb C}}$ as

$\displaystyle \pi(A,b)f(x) = |\det(A)|^{-1/2}f(A^{-1}x-b). \ \ \ \ \ (2)$

Indeed this the so called quasiregular representation of the semidirect product of ${D}$ and ${{\mathbb R}^d}$. Two important examples of 2-dimensional wavelet-like transformations are:

Example 3 The “standard” 2-dimensional wavelet transform. One takes the group

$\displaystyle D = \left\{ \begin{bmatrix} a_1 & -a_2\\ a_2 & a_1 \end{bmatrix}\ :\ a_1,a_2\neq 0 \right\}$

which is a combination of rotation and isotropic scaling. Another parametrization is:

$\displaystyle \left\{ a \begin{bmatrix} \cos(\phi) &-\sin(\phi)\\ \sin(\phi) & \cos(\phi) \end{bmatrix}\ :\ a> 0,\ \phi\in [0,2\pi{[}\right\}$

where ${a}$ is the scaling factor and ${\phi}$ is the rotation angle.

Example 4 The shearlet transform bases of the group

$\displaystyle D = \left\{ \begin{bmatrix} a & \sqrt{a}\,s\\ 0 & \sqrt{a} \end{bmatrix}\ : s\in {\mathbb R},\ a>0 \right\}$

which consists of anisotropic scaling by ${a}$ and “shear” by ${s}$.

Doing some more algebra, one observes that the center of the associated Lie algebra of the semidirect product of the form (2) is always trivial and hence, the identity never appears as a commutator. This neat observation shows, that no wavlet-like transformation which bases on a group structure can ever have any uncertainty relation which behaves like

$\displaystyle c\|f\|\leq \text{some product of variances of operators}$

as in the Heiseberg case.

Although this may not be a groundbreaking discovery, this observation and the whole underlying algebra somehow cleared my view on this issue.

1. The affine group behind the wavelet transform

Continuing my previous post on the uncertainty principle for the windowed Fourier transform, we now come to another integral transform: The wavelet transform.

In contrast to the windowed Fourier transform (which analyzes a function with respect to position and frequency) the wavelet transform analyzes a function with respect to position and scale. For a given analyzing function ${\psi}$ and a signal ${f}$, the wavelet transform is (for ${a\neq 0}$, ${b\in{\mathbb R}}$):

$\displaystyle W_\psi f(a,b) = \int_{\mathbb R} f(x) \tfrac{1}{\sqrt{|a|}}\psi(\tfrac{x-b}{a})dx.$

In the same way, the windowed Fourier transform could be written as inner products of ${f}$ with a translated and modulated window function, the wavelet transform can be written as inner products of ${f}$ with translated and scaled functions ${\psi}$. And again, these modifications which happen to the analyzing function come from a group.

Definition 1 The affine group ${G_{\text{aff}}}$ is the set ${({\mathbb R}\setminus\{0\})\times {\mathbb R}}$ endowed with the operation

$\displaystyle (a,b)(a',b') = (a\,a',a\,b' + b).$

Indeed this is group (with identity ${(1,0)}$ and inverse ${(a,b)^{-1} = (a^{-1},a^{-1}b)}$). The name affine group stems from the fact that the group operation behaves like the composition of one dimensional affine linear functions: For ${f(x) = ax+b}$ and ${g(x) = a'\,x+b'}$ we have ${f(g(x)) = (a\,a')\, x + a\,b' + b}$.

The affine group admits a representation on the space of unitary operators on ${L^2({\mathbb R})}$:

$\displaystyle \Pi(a,b)f(x) = \tfrac{1}{\sqrt{|a|}}\psi(\tfrac{x-b}{a})$

(note the normalizing factor ${1/\sqrt{|a|}}$).

2. The affine uncertainty principle

I am not sure who has to credited for the group theoretical background behind wavelets however, the two-part paper “Transforms associated to square integrable group representations” by Grossmann, Morlet and Paul has been influential (and can be found, e.g. in the compilation “Fundamental papers in wavelet theory” by Heil and Walnut.

As done by Stephan Dahlke and Peter Maass in “The Affine Uncertainty Principle in One and Two Dimensions” and can proceed in analogy to the windowed Fourier transform and the corresponding Weyl-Heisenberg group and compute the infinitesimal operators: Take the derivative of the representation with the respect to the group parameters and evaluate at the identity:

$\displaystyle T_a f(x) := \frac{d}{da}[\Pi(a,b)f(x)|_{(a,b) = (1,0)} = -\tfrac{1}{2}f(x) - xf'(x)$

and

$\displaystyle T_b f(x) := \frac{d}{db}[\Pi(a,b)f(x)|_{(a,b) = (1,0)} = -f'(x).$

Again, these operators are skew adjoint and hence, multiplying by ${\mathrm{i}}$ gives self-adjoint operators.

These operators ${\mathrm{i} T_b}$ and ${\mathrm{i} T_a}$ do not commute and hence, applying Robertson’s uncertainty principle gives an inequality. The commutator of ${\mathrm{i} T_a}$ and ${\mathrm{i} T_b}$ is

$\displaystyle [\mathrm{i} T_a,\mathrm{i} T_b] f(x) = f'(x).$

$\displaystyle \tfrac{1}{2}|\langle f',f\rangle| \leq \|(\mathrm{i} T_a - \mu_1 I)f\|\,\| (\mathrm{i} T_b - \mu_2 I)f\| \ \ \ \ \ (1)$

and with some manipulation this turn to (for ${\|f\|=1}$)

$\displaystyle \tfrac{1}{4}|\langle f',f\rangle|\leq (\|f'\|^2 - \mu_2^2)(\|xf'\|^2 -\tfrac{1}{4} - \mu_1^2). \ \ \ \ \ (2)$

Again, one can derive the functions for which equality in attained and these are the functions of the form

$\displaystyle f(x) = c(x-\mathrm{i} \lambda) ^{-1/2 + \mathrm{i}\lambda\mu_2+\mathrm{i}\mu_1}$

for real ${\lambda}$. (By the way, these functions are indeed wavelets and sometimes called Cauchy-wavelets because of their analogy with the Cauchy kernel from complex analysis.)

By the way: These functions are necessarily complex valued. If one restricts oneself to real valued functions there is a simpler inequality, which one may call “real valued affine uncertainty”. First, observe that ${\langle f',f\rangle = 0}$ for real valued ${f}$, and hence, the left hand side in (1) is zero (which make the inequality a bit pointless). Using that for real valued ${f}$ we have ${\langle T_a f,f\rangle = 0}$, and that ${\|T_a f\|^2 = \|xf'\| - \tfrac{1}{4}\|f\|^2}$ together with ${\|(\mathrm{i} T_b -\mu_2)f\|^2\neq 0}$ for ${f\neq 0}$ we obtain (with ${\mu_1=0}$) from (1)

$\displaystyle \tfrac{1}{2} \|f\| \leq \|xf'\|.$

Since we know that equality is only attained for the Cauchy wavelets (which are not real valued we can state:

Corollary 2 (Real valued affine uncertainty) For any real valued function which is in the domain of ${[T_a,T_b]}$ it holds that

$\displaystyle \tfrac{1}{2} \|f\| < \|xf'\|.$

As some strange curiosity, one can derive this “real valued affine uncertainty principle” by formal integration by parts and Cauchy-Schwarz inequality totally similar to the Heisenberg uncertainty principle (as I’ve done in my previous post):

$\displaystyle \begin{array}{rcl} \|g\|_2^2 & =& \int_{\mathbb R} 1\cdot|g(x)|^2dx\\ & = &-\int_{\mathbb R} x\tfrac{d}{dx}|g(x)|^2dx\\ & = &-2\int_{\mathbb R} xg(x)g'(x)dx\\ & \leq &2\int_{\mathbb R} |xg'(x)|\,|g(x)|dx\\ & \leq &2\Big(\int_{\mathbb R} |xg'(x)|^2dx\Big)^{1/2}\Big(\int_{\mathbb R} |g(x)|^2dx\Big)^{1/2}. \end{array}$

Dividing by ${2\|g\|}$ gives the “real valued affine uncertainty” (but only in the non-strict way).

Some years ago I became fascinated by uncertainty principles. I got to know them via signal processing and not via physics, although, from a mathematical point of view they are the same.

I recently supervised a Master’s thesis on this topic and the results clarified a few things for me which I used to find obscure and I’d like to illustrate this here on my blog. However, it takes some space to introduce notation and to explain what it’s all about and hence, I decided to write a short series of posts, I try to explain, what new insights I got from the thesis. Here comes the first post:

1. The Fourier transform and the windowed Fourier transform

Let’s start with an important tool from signal processing you all know: The Fourier transform. For ${f:{\mathbb R}\rightarrow{\mathbb C}}$ the Fourier transform is

$\displaystyle \hat f(\omega) = (2\pi)^{-1/2}\int_{\mathbb R} f(x) \mathrm{e}^{-\mathrm{i} x\omega} dx.$

(I was tempted to say “whenever the integral is defined”. However, the details here are a little bit more involved, but I will not go into detail here; ${hat f}$ is defined for ${L^2}$-functions, for ${L^1}$ functions and even for tempered distributions…) Roughly speaking, the Fourier transform decomposes a signal into its frequency components, which can be seen from the Fourier inversion formula:

$\displaystyle f(x) = (2\pi)^{-1/2}\int_{\mathbb R} \hat f(\omega) \mathrm{e}^{\mathrm{i} x\omega} d\omega,$

i.e. the (complex) number ${f(\omega)}$ says “how much the frequency ${\omega}$ (i.e. the function ${x\mapsto \mathrm{e}^{\mathrm{i} x\omega}}$) contributes to ${f}$”. In the context of signal processing one often speaks of the “time representation” ${f}$ and the “frequency representation” ${\hat f}$.

One drawback of the Fourier transform, when used to analyze signals, is its “global” nature in that the value ${\hat f(\omega)}$ depends on every value of ${f}$, i.e. a change of ${f}$ in a small interval results a change of all of ${\hat f}$. A natural idea (which is usually attributed to Gabor) is, to introduce a window function ${g}$ which is supposed to be a bump function, centered at zero, then translate this function and “localize” ${f}$ by multiplying it with ${g(\cdot-t)}$. The resulting transform

$\displaystyle G_gf(\omega,t) = (2\pi)^{-1/2}\int_{\mathbb R} f(x)g(x-t)\mathrm{e}^{-ix\omega}dx$

is called windowed Fourier transform, short-time Fourier transform or (in the case of ${g(x) = \mathrm{e}^{-x^2/2}}$) Gabor transform.

Of course we can write the windowed Fourier transform in term of the usual Fourier transform as

$\displaystyle G_gf(\omega,t) = \widehat{(f\,g(\cdot-t))}(\omega). \ \ \ \ \ (1)$

In other words: The localization in time is precisely determined by the “locality” of ${g}$, that is, how well ${g}$ is concentrated around zero. The better ${g}$ is concentrated around ${0}$, the more “localized around ${t}$” is the information of ${f}$, the windowed Fourier transform ${G_gf(\omega,t)}$ uses.

For the localization in frequency one obtains (by Plancherel’s formula and integral substitution) that

$\displaystyle G_gf(\omega,t) = \mathrm{e}^{-\mathrm{i} x\omega}\widehat{(\hat f\,\hat g(\cdot-\omega)}(-x).$

In other words: The localization in frequency is precisely determined by the “locality” of ${\hat g}$, that is, how well ${\hat g}$ is concentrated around zero. The better ${\hat g}$ is concentrated around ${0}$, the more “localized around ${\omega}$” is the information of ${\hat f}$, the windowed Fourier transform ${G_gf(\omega,t)}$ uses.

Hence, it seems clear that a function ${g}$ is well suited as a window function, if it both well localized in time and frequency. If one measures the localization of a function around zero by its variance

$\displaystyle \text{Var}_g = \int_{\mathbb R} x^2|g(x)|^2 dx$,

then there is the fundamental lower bound on the product of the variance of a function and the variance of its Fourier transform, know under the name “Heisenberg uncertainty principle” (or, as I learned from Wikipedia, “Gabor limit”): For ${\|g\|_{L^2}=1}$ it holds that

$\displaystyle \text{Var}_g\cdot\text{Var}_{\hat g}\geq \tfrac14.$

Proof: A simple (not totally rigorous) proof goes like this: We use partial integration, the Cauchy-Schwarz inequality and the Plancherel formula:

$\displaystyle \begin{array}{rcl} \|g\|_2^2 & =& \int_{\mathbb R} 1\cdot|g(x)|^2dx\\ & = &-\int_{\mathbb R} x\tfrac{d}{dx}|g(x)|^2dx\\ & = &-2\int_{\mathbb R} xg(x)g'(x)dx\\ & \leq &2\int_{\mathbb R} |xg(x)|\,|g'(x)|dx\\ & \leq &2\Big(\int_{\mathbb R} |xg(x)|^2dx\Big)^{1/2}\Big(\int_{\mathbb R} |g'(x)|^2dx\Big)^{1/2}\\ & = &2\Big(\int_{\mathbb R} |xg(x)|^2dx\Big)^{1/2}\Big(\int_{\mathbb R} |\omega \hat g(\omega)|^2d\omega\Big)^{1/2}\\ & = &2(\text{Var}_g)^{1/2}(\text{Var}_{\hat g})^{1/2}. \end{array}$

$\Box$

Moreover, the inequality is sharp for the functions ${g(x) = C\mathrm{e}^{-\lambda x^2}}$ for ${\lambda>0}$. In this sense, these Gaussians are best suited for the windowed Fourier transform.

While this presentation was geared towards usability, there is a quite different approach to uncertainty principles related to integral transforms which uses the underlying group structure.

2. The group behind the windowed Fourier transform

The windowed Fourier transform (1) can also be seen as taking inner products of ${f}$ with the family of functions ${g(\cdot-t)\mathrm{e}^{-\mathrm{i} x\omega}}$. This family is obtained from the single function ${g}$ by letting the so-called Weyl-Heisenberg group act on it:

Definition 1 The Weyl-Heisenberg group ${G_{WH}}$ is the set ${{\mathbb R}\times{\mathbb R}\times S^1}$ endowed with the operation

$\displaystyle (\omega,b,\tau)(\omega',b'\tau') = (\omega+\omega',b+b',\tau\tau'\mathrm{e}^{\mathrm{i} (\omega b'-\omega' b)/2)}).$

The Weyl-Heisenberg group admits a representation of the space of unitary operators on ${L^2({\mathbb R})}$, that is a map ${\Pi:G_{WH}\rightarrow U(L^2({\mathbb R}))}$

$\displaystyle \Pi(\omega,b,\tau)f(x) = \tau\mathrm{e}^{-\mathrm{i} \omega b/2}\mathrm{e}^{\mathrm{i}\omega x}f(x-b).$

It indeed the operators ${\Pi(\omega,b,\tau)}$ are unitary and it holds that

$\displaystyle \Pi(\omega,b,\tau)\Pi(\omega',b',\tau')f(x) = \Pi((\omega,b,\tau)(\omega',b',\tau'))f(x).$

Moreover, the mapping ${(\omega,b,\tau)\mapsto \Pi(\omega,b,\tau)f}$ is continuous for all ${f}$, a property of the representation which is called strong continuity.

In this light, the windowed Fourier transform can be written as

$\displaystyle G_g f(\omega,t) = (2\pi)^{-1/2} \langle f,\Pi(\omega,b,\mathrm{e}^{\mathrm{i}\omega b/2})g\rangle.$

Now there is a motivation for the uncertainty principle as follows: Associated to the Weyl-Heisenberg group there is the Weyl-Heisenberg algebra, a basis of which is given by the so called infinitesimal generators. These are, roughly speaking, the derivatives of the representation with respect to the group parameters, evaluated at the identity. In the Weyl-Heisenberg case:

$\displaystyle T_\omega f(x) := \frac{d}{d\omega}[\Pi(\omega,b,\tau)f(x)|_{(\omega,b,\tau) = (0,0,1)} = \mathrm{i} xf(x)$

and

$\displaystyle T_b f(x) := \frac{d}{db}[\Pi(\omega,b,\tau)f(x)|_{(\omega,b,\tau) = (0,0,1)} = -f'(x)$

and

$\displaystyle T_\tau f(x) := \frac{d}{d\tau}[\Pi(\omega,b,\tau)f(x)|_{(\omega,b,\tau) = (0,0,1)} = \mathrm{i} f(x).$

(In the last case, my notation was not too good: Note that ${\tau = \mathrm{e}^{\mathrm{i} t}}$ with ${t\in[0,2\pi[}$ and the derivative has to be taken with respect to ${t}$.)

All these operators are skew adjoint on ${L^2({\mathbb R})}$ and hence the operators

$\displaystyle \mathrm{i} T_\omega f(x) = -xf(x),\quad \mathrm{i} T_b f(x) = -\mathrm{i} f'(x),\quad \mathrm{i} T_\tau f(x) = -f(x)$

For any two (possibly unbounded) operators on a Hilbert space there is a kind of abstract uncertainty principle (apparently sometimes known as Robertson’s uncertainty principle. It uses the commutator ${[A,B] = AB-BA}$ of two operators:

Theorem 2 For any two self adjoint operators ${A}$ and ${B}$ on a Hilbert space it holds that for any ${f}$ in the domain of definition of ${[A,B]}$ and any real numbers ${\mu_1}$ and ${\mu_2}$ it holds that

$\displaystyle \tfrac12|\langle [A,B] f,f\rangle|\leq \|(A-\mu_1 I)f\|\,\|(B-\mu_2)f\|.$

Proof: The proof simply consists of noting that

$\displaystyle \begin{array}{rcl} |\langle [A,B] f,f\rangle| &=& |\langle B f,Af\rangle -\langle Af,BS f\rangle|\\ & =& |\langle (B-\mu_2 I) f,(A-\mu_1 I)f\rangle -\langle (A-\mu_1 I)f,(B-\mu_2 I)S f\rangle|\\ & =& |2\text{Im} \langle (B-\mu_2 I) f,(A-\mu_1 I)f\rangle|\\ & \leq &2|\langle (B-\mu_2 I) f,(A-\mu_1 I)f\rangle|. \end{array}$

Now use Cauchy-Schwarz to obtain the result. $\Box$

Looking closer at the inequalities in the proof, one infers in which cases Robertson’s uncertainty principle is sharp: Precisely if there is a real ${\lambda}$ such that

$\displaystyle (A-\mu_1 I)f = -\mathrm{i}\lambda (B-\mu_2 I)f. \ \ \ \ \ (2)$

Now the three self-adjoint operators ${\mathrm{i} T_\omega}$, ${\mathrm{i} T_b}$ and ${\mathrm{i} T_\tau}$ have three commutators but since ${\mathrm{i} T_\tau}$ is a multiple of the identity and hence commutes with the others. I.e. there is only one commutator relation:

$\displaystyle [\mathrm{i} T_b,\mathrm{i} T_\omega] f = \mathrm{i} f.$

Hence, using ${A=\mathrm{i} T_b}$, ${B = \mathrm{i} T_\omega}$ and ${\mu_1=\mu_2=0}$ in Robertson’s uncertainty principle gives (in sloppy notation)

$\displaystyle \tfrac12 \|f\|^2\leq \|f'\|_2\|xf\|_2$

which is exactly the Heisenberg uncertainty principle.

Moreover, by (2), equality happens if ${f}$ fulfills the differential equation

$\displaystyle -\mathrm{i} f'(x) = \mathrm{i}\lambda x f(x)$

the solution of which are exactly the functions ${f(x) = C\mathrm{e}^{-\lambda x^2/2}}$.

Since the Heisenberg uncertainty principle is such an impressing thing with a broad range of implications (a colleague said that its interpretation, consequences and motivation somehow form a kind of “holy grail” in some communities), one may try to find other cool uncertainty principles be generalizing the approach of the previous sections to other transformations.

In the next post I am going to write about other group related integral transforms and its “uncertainty principles”.

A few days ago I’ve been to IFIP TC7 and participated in the minisymposium “Optimization in Banach spaces with sparsity constraints” organized by Kristian Bredies and Christian Clason. My session was very interesting, consisting of talks of Masha Zhariy (Optimization algorithms for sparse reconstruction in inverse problems with adaptive stopping rules), Caroline Vehoeven (On a generalization of the iterative soft-thresholding algorithm for the case of non-separable penalty) and Kristian (Inverse problems in spaces of measures). Unfortunately, I could not take part in the second half of the minisymposium.

Although the minisymposium was interesting, I’d like to talk about the plenary Talk “Signal and systems representation; signal spaces, sampling, and quantization effects” by Holger Boche. The title sounded as I should be able to follow, but I’ve never heard the name before. And indeed, Holger Boche gave a fresh view onto the sampling and reconstruction problem, i.e. the digital-to-analog conversion and its inverse.

It was somehow refreshing to have a talk about sampling which was totally free of the words “compressive” and “compressed”. Holger Boche modelled continuous time signals in the well known Bernstein and Paley-Wiener spaces and started with fairly old but not too well known results on exact reconstruction from non-uniform sampling.

Definition 1 For ${1\leq p\leq \infty}$ and ${\sigma>0}$, the Paley-Wiener space ${\mathcal{PW}_\sigma^p}$ is

$\displaystyle \mathcal{PW}_\sigma^p = \{\hat g\ :\ g\in L^p([-\sigma,\sigma])\}.$

Definition 2 A function ${\phi}$ is of sine-type if it is of exponential type ${\pi}$ and

1. its zeros are separated, and
2. there exist real ${A,B,H}$ such that for all ${|y|\geq H}$ it holds that

$\displaystyle Ae^{\pi|y|}\leq |f(x+iy)| \leq Be^{\pi|y|}.$

Then, one theorem of reconstruction for non-uniform sampling points goes as follows:

Theorem 3 Let ${\phi}$ be a function of sine-type, whose zeros ${(t_k)}$ are all real and define

$\displaystyle \phi_k(t) = \frac{\phi(t)}{\phi'(t_k)(t-t_k)}.$

Then, for every ${f\in\mathcal{PW_\pi^1}}$ and all ${T>0}$ it holds that

$\displaystyle \lim_{N\rightarrow\infty}\|f - \sum_{k=-N}^N f(t_k)\phi_k\|_{L^\infty([-T,T])} \rightarrow 0.$

This theorem is a generalization of a result of J.L. Brown on local uniform convergence of the Shannon-sampling series.

These results are related to a sampling theorem due to Logan which I learned some years ago. First we recall that the Hilbert transform ${\mathcal{H}f}$ of a function ${f}$ can be defined via the Fourier transform ${\mathcal{F}}$ as

$\displaystyle \mathcal{F}(\mathcal{H}(u))(\omega) = (-i\text{sign}(\omega))\mathcal{F}(u)(\omega).$

Then, Logan’s Theorem is as follows:

Theorem 4 (Logan’s Sampling Theorem) Let ${f_1,f_2:\mathbb{R}\rightarrow\mathbb{R}}$ such that the support of both ${\hat f_1}$ and ${\hat f_2}$ is contained in ${[-b,-a]\cup[a,b]}$ for some number ${a}$ and ${b}$ with ${0. Then, if ${f_1}$ and ${f_2}$ do not have a root in common with their Hilbert transform, it holds that ${\text{sign} f_1 = \text{sign} f_2}$ implies that ${f_2}$ is equal to ${f_1}$ up to a constant factor.

Proof: (Informal) Since modulation shifts the Fourier transform, we can write a function ${f}$ with ${\text{supp}\hat f\subset -b,-a]\cup[a,b]}$ as a modulated sum of functions ${h_1}$ and ${h_2}$ with bandwidth ${(b-a)/2}$, using ${\mu=(a+b)/2}$ as follows

$\displaystyle f(t) = -i h_1(t)e^{i\mu t} + if_2(t)e^{-i\mu t}.$

Expressing ${\exp}$ in terms of ${\sin}$ and ${\cos}$ gives

$\displaystyle \begin{array}{rcl} f(t) &=& (-if_1(t) + if_2(t))\cos(\mu t) + (f_1(t)+f_2(t))\sin(\mu t)\\ &=& p(t)\cos(\mu t) + q(t)\sin(\mu t). \end{array}$

Note that ${p}$ and ${q}$ are real valued since ${f}$ is assumed to be real valued. Using ${\mathcal{H}(\sin) = \cos}$ and ${\mathcal{H}(\cos) = -\sin}$ we conclude that

$\displaystyle \mathcal{H}(f)(t) = p(t)\sin(\mu t) - q(t)\cos(\mu t).$

With the obvious notation we obtain that

$\displaystyle \mathcal{H}(f_1)(t)f_2(t) - \mathcal{H}(f_2)(t)f_1(t) = p_1(t)q_2(t) - p_2(t)q_1(t) = g(t).$

Now, the zeros of ${g}$ are at least the common zeros of ${f_1}$ and ${f_2}$. Moreover, the bandwidth of ${g}$ is at most ${b-a}$ (since the bandwidth of ${p_j}$ and ${q_j}$ is at most ${(b-a)/2}$). We can conclude that ${g}$ is identically zero by an argument which uses that upper density of the zeros of ${f_1}$ (resp. ${f_2}$) is high enough to force ${g}$ to zero. Hence, we know that

$\displaystyle \frac{f_1(t)}{\mathcal{H}(f_1)(t)} = \frac{f_2(t)}{\mathcal{H}(f_2)(t)}$

To finish the proof one needs some argument from complex analysis (using that both sides of the above equations are extendable to meromorphic function of exponetial type) to conclude that ${f_1}$ equals ${f_2}$ up to a constant factor. $\Box$

When I asked about the relation between Logan’s theorem and his results, he replied that Logan’s theorem was the starting point since he was curious why there was no practical implementation of this sampling procedure available. More precisely, he was interested in designing a digital implementation, based on sampling points, which allows to digitize all linear time invariant filters. One of his negative results was the following.

Definition 5 A sequence ${(t_k)}$ is called a complete interpolation sequence for ${\mathcal{PW}_\pi^2}$ and ${\ell^2}$ if for every ${c\in\ell^2}$ there is exactly one ${f\in \mathcal{PW}_\pi^2}$ such that ${f(t_k) = c_k}$.

Theorem 6 For a complete interpolation sequence ${(t_k)}$ for ${\mathcal{PW}_\pi^2}$ and ${\ell^2}$, the corresponding reconstruction functions ${\phi_k}$ and a given ${t\in\mathbb{R}}$ there always exists a stable linear time invariant filter ${T}$ and a signal ${f\in \mathcal{PW}_\pi^1}$ such that

$\displaystyle \lim\sup_{N\rightarrow\infty} | (Tf)(t) - \sum_{k=-N}^N f(t_k)(T\phi_k)(t)| = \infty.$

He conjectured (Conjecture 2.1 in his talk), that this phenomenon of point-wise divergence of the approximation of a linear time invariant filter remains, even if oversampling is applied, that there always is a function ${f\in\mathcal{PW}_{\beta\pi}^1}$ (${0<\beta<1}$) such that point-wise divergence happens. Moreover, he conjectured that (Conjecture 2.2) digitizing stable linear time invariant filter should be possible if one replaced point-sampling by appropriate linear functionals and posed the open problem to both prove this conjecture and to find such linear functionals.

Although I am at ENUMATH since four days, I did not post any news yet.

Most talks here dealt with numerical methods for PDEs and since this is not my primary topic I sometimes had a hard time to grab what the problems and goals were. One exception was the talk by Franco Brezzi. Ha gave an entertaining talk entitled “To reconstruct or not to reconstruct?”. In a nutshell  he talked about discretization methods for PDEs by either finite differences and finite elements. He distinguished the two methods by the fact that finite difference methods only use and produce “nodal values”,  i.e. values of the solution at specific points. On the other hand, finite element methods  also work with a set of values describing the solution. However, these numbers are coefficients to some basis functions and hence, one can “reconstruct” a true function from these values. His first point was, that methods with “reconstruction” usually have much simples proofs for convergence and so on. However, finite difference methods are usually much more simple to implement since the discretization itself explicitly dictates the linear system one has to solve. He then introduced “mimetic finite differences” which, in my incomplete understanding, are a kind of finite difference methods that incorporate more geometric information. During his talk I thought about phrasing his concepts of “reconstruction” and “evaluating” in the language of signal processing as “interpolation” and “sampling” and if this would give another perspective which could be helpful.

On my way to ENUMATH 11 in Leicester I stumbled upon the preprint Multi-parameter Tikhonov Regularisation in Topological Spaces by Markus Grasmair. The paper deals with fairly general Tikhonov functionals and its regularizing properties. Markus considers (nonlinear) operators ${F:X\rightarrow Y}$ between two set ${X}$ and ${Y}$ and analyzes minimizers of the functional

$\displaystyle T(x) = S(F(x),y) + \sum_k \alpha_k R_k(x).$

The functionals ${S}$ and ${R_k}$ play the roles of a similarity measure and regularization terms, respectively. While he also treats the issue of noise in the operator and the multiple regularization terms, I was mostly interested in his approach to the general similarity measure. The category in which he works in that of topological spaces and he writes:

“Because anyway no trace of an original Hilbert space or Banach space structure is left in the formulation of the Tikhonov functional ${T}$ [...], we will completely discard all assumption of a linear structure and instead consider the situation, where both the domain ${X}$ and the co-domain ${Y}$ of the operator ${F}$ are mere topological spaces, with the topology of ${Y}$ defined by the distance measure ${S}$.”

The last part of the sentence is important since previous papers often worked the other way round: Assume some topology in ${Y}$ and then state conditions on ${S}$. Nadja Worliczek observed in her talk “Sparse Regularization with Bregman Discrepancy” at GAMM 2011 that it seems more natural to deduce the topology from the similarity measure and Markus took the same approach. While Nadja used the notion of “initial topology” (that is, take the coarsest topology that makes the functionals ${y\mapsto S(z,y)}$ continuous), Markus uses the following family of pseudo-metrics: For ${z\in Y}$ define

$\displaystyle d^{(z)}(y,\tilde y) = |S(z,y)-S(z,\tilde y)|.$

Unfortunately, the preprint is a little bit too brief for me at this point and I did not totally get what he means with “the topology ${\sigma}$ induced by the uniformity induced by the pseudo-metric”. Also, I am not totally sure if “pseudo-metric” is unambiguous.. However, the topology he has in mind seems to be well suited in the sense that ${y^n\rightarrow y}$ if ${S(z,y^n)\rightarrow S(z,y)}$ for all ${z}$. Moreover, the condition that ${S(z,y)=0}$ iff ${z=y}$ implies that ${\sigma}$ is Hausdorff. It would be good to have a better understanding on how the properties of the similarity measure are related to the properties of the induced topology. Are there examples in which the induced topology is both different from usual norm and weak topologies and also interesting?

Moreover, I would be interested, in the relations of the two approaches: via “uniformities” and the initial topology…