Some years ago I became fascinated by uncertainty principles. I got to know them via signal processing and not via physics, although, from a mathematical point of view they are the same.

I recently supervised a Master’s thesis on this topic and the results clarified a few things for me which I used to find obscure and I’d like to illustrate this here on my blog. However, it takes some space to introduce notation and to explain what it’s all about and hence, I decided to write a short series of posts, I try to explain, what new insights I got from the thesis. Here comes the first post:

1. The Fourier transform and the windowed Fourier transform

Let’s start with an important tool from signal processing you all know: The Fourier transform. For {f:{\mathbb R}\rightarrow{\mathbb C}} the Fourier transform is

\displaystyle  \hat f(\omega) = (2\pi)^{-1/2}\int_{\mathbb R} f(x) \mathrm{e}^{-\mathrm{i} x\omega} dx.

(I was tempted to say “whenever the integral is defined”. However, the details here are a little bit more involved, but I will not go into detail here; {hat f} is defined for {L^2}-functions, for {L^1} functions and even for tempered distributions…) Roughly speaking, the Fourier transform decomposes a signal into its frequency components, which can be seen from the Fourier inversion formula:

\displaystyle  f(x) = (2\pi)^{-1/2}\int_{\mathbb R} \hat f(\omega) \mathrm{e}^{\mathrm{i} x\omega} d\omega,

i.e. the (complex) number {f(\omega)} says “how much the frequency {\omega} (i.e. the function {x\mapsto \mathrm{e}^{\mathrm{i} x\omega}}) contributes to {f}”. In the context of signal processing one often speaks of the “time representation” {f} and the “frequency representation” {\hat f}.

One drawback of the Fourier transform, when used to analyze signals, is its “global” nature in that the value {\hat f(\omega)} depends on every value of {f}, i.e. a change of {f} in a small interval results a change of all of {\hat f}. A natural idea (which is usually attributed to Gabor) is, to introduce a window function {g} which is supposed to be a bump function, centered at zero, then translate this function and “localize” {f} by multiplying it with {g(\cdot-t)}. The resulting transform

\displaystyle  G_gf(\omega,t) = (2\pi)^{-1/2}\int_{\mathbb R} f(x)g(x-t)\mathrm{e}^{-ix\omega}dx

is called windowed Fourier transform, short-time Fourier transform or (in the case of {g(x) = \mathrm{e}^{-x^2/2}}) Gabor transform.

Of course we can write the windowed Fourier transform in term of the usual Fourier transform as

\displaystyle   G_gf(\omega,t) = \widehat{(f\,g(\cdot-t))}(\omega). \ \ \ \ \ (1)

In other words: The localization in time is precisely determined by the “locality” of {g}, that is, how well {g} is concentrated around zero. The better {g} is concentrated around {0}, the more “localized around {t}” is the information of {f}, the windowed Fourier transform {G_gf(\omega,t)} uses.

For the localization in frequency one obtains (by Plancherel’s formula and integral substitution) that

\displaystyle  G_gf(\omega,t) = \mathrm{e}^{-\mathrm{i} x\omega}\widehat{(\hat f\,\hat g(\cdot-\omega)}(-x).

In other words: The localization in frequency is precisely determined by the “locality” of {\hat g}, that is, how well {\hat g} is concentrated around zero. The better {\hat g} is concentrated around {0}, the more “localized around {\omega}” is the information of {\hat f}, the windowed Fourier transform {G_gf(\omega,t)} uses.

Hence, it seems clear that a function {g} is well suited as a window function, if it both well localized in time and frequency. If one measures the localization of a function around zero by its variance

\displaystyle  \text{Var}_g = \int_{\mathbb R} x^2|g(x)|^2 dx ,

then there is the fundamental lower bound on the product of the variance of a function and the variance of its Fourier transform, know under the name “Heisenberg uncertainty principle” (or, as I learned from Wikipedia, “Gabor limit”): For {\|g\|_{L^2}=1} it holds that

\displaystyle  \text{Var}_g\cdot\text{Var}_{\hat g}\geq \tfrac14.

Proof: A simple (not totally rigorous) proof goes like this: We use partial integration, the Cauchy-Schwarz inequality and the Plancherel formula:

\displaystyle  \begin{array}{rcl}  \|g\|_2^2 & =& \int_{\mathbb R} 1\cdot|g(x)|^2dx\\ & = &-\int_{\mathbb R} x\tfrac{d}{dx}|g(x)|^2dx\\ & = &-2\int_{\mathbb R} xg(x)g'(x)dx\\ & \leq &2\int_{\mathbb R} |xg(x)|\,|g'(x)|dx\\ & \leq &2\Big(\int_{\mathbb R} |xg(x)|^2dx\Big)^{1/2}\Big(\int_{\mathbb R} |g'(x)|^2dx\Big)^{1/2}\\ & = &2\Big(\int_{\mathbb R} |xg(x)|^2dx\Big)^{1/2}\Big(\int_{\mathbb R} |\omega \hat g(\omega)|^2d\omega\Big)^{1/2}\\ & = &2(\text{Var}_g)^{1/2}(\text{Var}_{\hat g})^{1/2}. \end{array}

\Box

Moreover, the inequality is sharp for the functions {g(x) = C\mathrm{e}^{-\lambda x^2}} for {\lambda>0}. In this sense, these Gaussians are best suited for the windowed Fourier transform.

While this presentation was geared towards usability, there is a quite different approach to uncertainty principles related to integral transforms which uses the underlying group structure.

2. The group behind the windowed Fourier transform

The windowed Fourier transform (1) can also be seen as taking inner products of {f} with the family of functions {g(\cdot-t)\mathrm{e}^{-\mathrm{i} x\omega}}. This family is obtained from the single function {g} by letting the so-called Weyl-Heisenberg group act on it:

Definition 1 The Weyl-Heisenberg group {G_{WH}} is the set {{\mathbb R}\times{\mathbb R}\times S^1} endowed with the operation

\displaystyle  (\omega,b,\tau)(\omega',b'\tau') = (\omega+\omega',b+b',\tau\tau'\mathrm{e}^{\mathrm{i} (\omega b'-\omega' b)/2)}).

The Weyl-Heisenberg group admits a representation of the space of unitary operators on {L^2({\mathbb R})}, that is a map {\Pi:G_{WH}\rightarrow U(L^2({\mathbb R}))}

\displaystyle  \Pi(\omega,b,\tau)f(x) = \tau\mathrm{e}^{-\mathrm{i} \omega b/2}\mathrm{e}^{\mathrm{i}\omega x}f(x-b).

It indeed the operators {\Pi(\omega,b,\tau)} are unitary and it holds that

\displaystyle  \Pi(\omega,b,\tau)\Pi(\omega',b',\tau')f(x) = \Pi((\omega,b,\tau)(\omega',b',\tau'))f(x).

Moreover, the mapping {(\omega,b,\tau)\mapsto \Pi(\omega,b,\tau)f} is continuous for all {f}, a property of the representation which is called strong continuity.

In this light, the windowed Fourier transform can be written as

\displaystyle  G_g f(\omega,t) = (2\pi)^{-1/2} \langle f,\Pi(\omega,b,\mathrm{e}^{\mathrm{i}\omega b/2})g\rangle.

Now there is a motivation for the uncertainty principle as follows: Associated to the Weyl-Heisenberg group there is the Weyl-Heisenberg algebra, a basis of which is given by the so called infinitesimal generators. These are, roughly speaking, the derivatives of the representation with respect to the group parameters, evaluated at the identity. In the Weyl-Heisenberg case:

\displaystyle  T_\omega f(x) := \frac{d}{d\omega}[\Pi(\omega,b,\tau)f(x)|_{(\omega,b,\tau) = (0,0,1)} = \mathrm{i} xf(x)

and

\displaystyle  T_b f(x) := \frac{d}{db}[\Pi(\omega,b,\tau)f(x)|_{(\omega,b,\tau) = (0,0,1)} = -f'(x)

and

\displaystyle  T_\tau f(x) := \frac{d}{d\tau}[\Pi(\omega,b,\tau)f(x)|_{(\omega,b,\tau) = (0,0,1)} = \mathrm{i} f(x).

(In the last case, my notation was not too good: Note that {\tau = \mathrm{e}^{\mathrm{i} t}} with {t\in[0,2\pi[} and the derivative has to be taken with respect to {t}.)

All these operators are skew adjoint on {L^2({\mathbb R})} and hence the operators

\displaystyle  \mathrm{i} T_\omega f(x) = -xf(x),\quad \mathrm{i} T_b f(x) = -\mathrm{i} f'(x),\quad \mathrm{i} T_\tau f(x) = -f(x)

are self adjoint.

For any two (possibly unbounded) operators on a Hilbert space there is a kind of abstract uncertainty principle (apparently sometimes known as Robertson’s uncertainty principle. It uses the commutator {[A,B] = AB-BA} of two operators:

Theorem 2 For any two self adjoint operators {A} and {B} on a Hilbert space it holds that for any {f} in the domain of definition of {[A,B]} and any real numbers {\mu_1} and {\mu_2} it holds that

\displaystyle  \tfrac12|\langle [A,B] f,f\rangle|\leq \|(A-\mu_1 I)f\|\,\|(B-\mu_2)f\|.

Proof: The proof simply consists of noting that

\displaystyle  \begin{array}{rcl}  |\langle [A,B] f,f\rangle| &=& |\langle B f,Af\rangle -\langle Af,BS f\rangle|\\ & =& |\langle (B-\mu_2 I) f,(A-\mu_1 I)f\rangle -\langle (A-\mu_1 I)f,(B-\mu_2 I)S f\rangle|\\ & =& |2\text{Im} \langle (B-\mu_2 I) f,(A-\mu_1 I)f\rangle|\\ & \leq &2|\langle (B-\mu_2 I) f,(A-\mu_1 I)f\rangle|. \end{array}

Now use Cauchy-Schwarz to obtain the result. \Box

Looking closer at the inequalities in the proof, one infers in which cases Robertson’s uncertainty principle is sharp: Precisely if there is a real {\lambda} such that

\displaystyle   (A-\mu_1 I)f = -\mathrm{i}\lambda (B-\mu_2 I)f. \ \ \ \ \ (2)

Now the three self-adjoint operators {\mathrm{i} T_\omega}, {\mathrm{i} T_b} and {\mathrm{i} T_\tau} have three commutators but since {\mathrm{i} T_\tau} is a multiple of the identity and hence commutes with the others. I.e. there is only one commutator relation:

\displaystyle  [\mathrm{i} T_b,\mathrm{i} T_\omega] f = \mathrm{i} f.

Hence, using {A=\mathrm{i} T_b}, {B = \mathrm{i} T_\omega} and {\mu_1=\mu_2=0} in Robertson’s uncertainty principle gives (in sloppy notation)

\displaystyle  \tfrac12 \|f\|^2\leq \|f'\|_2\|xf\|_2

which is exactly the Heisenberg uncertainty principle.

Moreover, by (2), equality happens if {f} fulfills the differential equation

\displaystyle  -\mathrm{i} f'(x) = \mathrm{i}\lambda x f(x)

the solution of which are exactly the functions {f(x) = C\mathrm{e}^{-\lambda x^2/2}}.

Since the Heisenberg uncertainty principle is such an impressing thing with a broad range of implications (a colleague said that its interpretation, consequences and motivation somehow form a kind of “holy grail” in some communities), one may try to find other cool uncertainty principles be generalizing the approach of the previous sections to other transformations.

In the next post I am going to write about other group related integral transforms and its “uncertainty principles”.