In a recent post I wrote about time varying channels. These were operators, described by the spreading function {s:{\mathbb R}\times{\mathbb R}\rightarrow{\mathbb C}}, mapping an input signal {x} to an output signal {y} via

\displaystyle y(t) = H_s x(t) = \int_{{\mathbb R}}\int_{{\mathbb R}} s(\tau,\nu)x(t-\tau)\mathrm{e}^{\mathrm{i}\nu t}d\nu d\tau.

1. The problem statement

In this post I write about the problem of identifying the channels from input-output measurements. More precisely, the question is as follows:

Is it possible to choose a single input signal such that the corresponding output signal allows the computation of the spreading function?

At first sight this seems hopeless: The degrees of freedom we have is a single function {x:{\mathbb R}\rightarrow{\mathbb C}} and we look for a function {s:{\mathbb R}\times{\mathbb R}\rightarrow{\mathbb C}}. However, additional assumptions on {s} may help and indeed there is the following identification theorem due to Kailath:

Theorem 1 (Kailath) The channel {H_s} can be identified if the support of the spreading function {s} is contained in a rectangle with area smaller than {2\pi}.

Probably this looks a bit weird: The size of the support shall allow for identification? However, after turning the problem a bit around one sees that there is an intuitive explanation for this theorem which also explains the constant {2\pi}.

2. Translation to integral operators

The first step is, to translate the problem into one with integral operators of the form

\displaystyle F_k x(t) = \int_{\mathbb R} k(t,\tau)x(\tau)d\tau.

We observe that {k} is linked to {s} via

\displaystyle k(t,\tau) = \int s(t-\tau,\nu)e^{i\nu t}d\nu = (2\pi)^{1/2}\mathcal{F}_2^{-1}(s(t-\tau,\cdot))(t). \ \ \ \ \ (1)


and hence, identifying {s} is equivalent to identifying {k} (by the way: I use the same normalization of the Fourier transform as in this post).

From my point of view, the formulation of the identification problem is a bit clearer in the form of the integral operator {F_k}. It very much resembles “matrix-vector multiplication”: First you multiply {k(t,\tau)} with you input signal {x} in {\tau} (forming “pointwise products of the rows and the input vector” as in matrix-vector multiplication) and then integrate with respect to {t} (“summing up” the result).

3. Sending Diracs through the channel

Now, let’s see, if we can identify {k} from a single input-output measurement. We start, and send a single Dirac impulse through the channel: {x = \delta_0}. This formally gives

\displaystyle y(t) = \int_{\mathbb R} k(t,\tau)\delta_0(\tau)d\tau = k(t,0).

Ok, this is not too much. From the knowledge of {y} we can directly infer the values {k(t,0)} but that’s it – nothing more.

But we could also send a Dirac at a different position {\tau_0}, {x = \delta_{\tau_0}} and obtain

\displaystyle y(t) = \int_{\mathbb R} k(t,\tau)\delta_{\tau_0}(\tau)d\tau = k(t,\tau_0).

Ok, different Diracs give us information about {k} for all {t} but only for a single {\tau_0}. Let’s send a whole spike train (or Dirac comb) of “width” {c>0}: {x = \Delta_c = \sum_{n\in{\mathbb Z}} \delta_{cn}}. We obtain

\displaystyle y(t) = \int_{\mathbb R} k(t,\tau)\sum_{n\in{\mathbb Z}}\delta_{nc}(\tau)d\tau = \sum_{n\in{\mathbb Z}}k(t,cn).

Oh – this does not reveal a single value of {k} but “aggregated” values…

4. Choosing the right spike train

Now let’s translate the one condition on {s} we have to a condition on {k}: The support of {s} is contained in a rectangle with area less then {2\pi}. Let’s assume that this rectangle in centered around zero (which is no loss of generality) and denote it by {[-a/2,a/2]\times [-b/2,b/2]}, i.e. {s(\tau,\nu) = 0} if {|\tau|>a/2} or {|\nu|>b/2} (and of course {ab<2\pi}). From (1) we conclude that

\displaystyle k(t,\tau) = 0\ \text{ for }\ |t-\tau|>a/2.

In the {(t,\tau)}-plane, the support looks like this:

If we now send this spike train in the channel, we can visualize the output as follows:

We observe that in this case we can infer the values of {k(t,nc)} at some points {t} but at other points we have an overlap and only observe a sum of {k(t,nc)} and {k(t,(n+1)c)}. But if we make {c} large enough, namely larger than {a}, we identify the values of {k(t,nc)} exactly in the output signal!

5. Is that enough?

Ok, now we know something about {k} but how can we infer the rest of the values? Don’t forget, that we know that {ab < 2\pi} and that we know that {s(\tau,\nu)} is supported in {[-a/2,a/2]\times[-b/2,b/2]}. Up to now we only used the value of {a}. How does the support limitation if {\nu} translates to {k}? We look again at (1) and see: The function {\tilde k_t:\tau\mapsto k(t,t-\tau)} is bandlimited with bandwidth {b/2} for every {t}. And what do we know about these functions {\tilde k_t}? We know the samples {k(t,nc) = \tilde k_t(t-nc)}. In other words, we have samples of {\tilde k_t} with the sampling rate {c}. But there is the famous Nyquist-Shannon sampling theorem:

Theorem 2 (Nyquist-Shannon) A function {f:{\mathbb R}\rightarrow{\mathbb C}} which is bandlimited with bandwidth {B} is totally determined by its discrete time samples {f(n\pi/B)}, {n\in{\mathbb Z}} and it holds that

\displaystyle f(t) = \sum_{n\in{\mathbb Z}} f(n\pi/N)\textup{sinc}\Big(\tfrac{B}{\pi}(x - \tfrac{n\pi}{B})\Big).

We are happy with the first assertion: The samples totally determine the function if they are dense enough. In our case the bandwidth of {\tilde k_t} is {b/2} and hence, we need the sampling rate {2\pi/b}. What we have, are samples with rate {c}.

We collect our conditions on {c}: We need {c}

  • larger than {a} to separate the values of {k(t,nc)} in the output.
  • smaller than {2\pi/b} to determine the full functions {\tilde k_t(\tau) = k(t,t-\tau)}.

Both together say

\displaystyle a < c < \frac{2\pi}{b}.

Such a {c} exists, if {ab < 2\pi} and we have proven Theorem 1.

6. Concluding remarks

The proof of Kailath’s theorem reveal the role of the constraint {ab <2\pi}: We want {a} not to be too large to be able to somehow separate the values of {k(t,nc)} in the output measurement and we need {b} not too large to ensure that we can interpolate the values {k(t,nc)} exactly.

However, a severe drawback of this results is that one needs to know {a} to construct the “sensing signal” which was the spike train {\Delta_c}. Moreover, this sensing signal has itself an infinite bandwidth which is practically not desirable. It seems to be an open question whether there are more practical sensing signals.

There is more known about sensing of time varying channels: The support of {s} does not have to be in a rectangle, it is enough if the measure of the support is smaller than {2\pi} (a result usually attributed to Bello). Moreover, there are converse results which say that linear and stable identification is only possible under this restriction on the support size (see the work of Götz Pfander and coauthors).


I stumbled upon the notion of “time varying channels” in signal processing and after reading a bit I realized that these are really interesting objects and appear in many different parts of mathematics. In this post I collect of few of their realizations and relations.

In signal processing a “time varying channel” is a mapping which maps a signal {x:{\mathbb R}^d\rightarrow {\mathbb C}} to an output {y:{\mathbb R}^d\rightarrow{\mathbb C}} via

\displaystyle y(t) = \int_{{\mathbb R}^d}\int_{{\mathbb R}^d} s(\tau,\nu)x(t-\tau)\mathrm{e}^{\mathrm{i}\nu t}d\nu d\tau.

Before we explain the name “time varying channel” we fix the notation for the Fourier transform I am going to use in this post:

Definition 1 For {f:{\mathbb R}^d\rightarrow{\mathbb C}} we define the Fourier transform by

\displaystyle \mathcal{F}(f)(\omega) = \hat f(\omega) = (2\pi)^{-d/2}\int f(t)\mathrm{e}^{-\mathrm{i}\omega t}dt

and denote the inverse Fourier transform by {\mathcal{F}^{-1}f} or {\check f}. For a function {f:{\mathbb R}^d\times{\mathbb R}^d\rightarrow{\mathbb C}} we denote with {\mathcal{F}_1} and {\mathcal{F}_2} the Fourier transform with respect to the first and second {{\mathbb R}^d}-component, respectively.

Remark 1 In all what follows we do formal calculations with integral not caring about integrability. All calculation are justified in the case of Schwartz-functions and often hold in a much broader context of tempered distributions (this for example happens if the integrals represent Fourier transforms of functions).

The name “time varying channel” can be explained as follows: Consider a pure frequency as input: {x(t) = \mathrm{e}^{-\mathrm{i}\omega t}}. A usual linear channel gives as output a damped signal and the damping depends on the frequency {\omega}: {y(t) = h(\omega) \mathrm{e}^{-\mathrm{i}\omega t}}. If we send the pure frequency in our time varying channel we get

\displaystyle \begin{array}{rcl} y(t) & = &\int\int s(\tau,\nu) \mathrm{e}^{-\mathrm{i}\omega(t-\tau)}e^{\mathrm{i}\nu t}d\nu dt\\ & =& \int\int s(\tau,\nu) \mathrm{e}^{\mathrm{i}(\omega\tau +\nu t)}d\nu dt\, \mathrm{e}^{-\mathrm{i}\omega t}\\ & =& (2\pi)^d \hat s(-\omega,-t) \mathrm{e}^{-\mathrm{i}\omega t}. \end{array}

Hence, the time varying channel also damps the pure frequencies but with a time dependent factor.

Let’s start quite far away from signal processing:

1. Pseudo-differential operators

A general linear differential operator of order {N} on functions on {{\mathbb R}^d} is defined with multiindex notation as

\displaystyle Af(t) = \sum_{|\alpha|\leq N} \sigma_\alpha(t) D^\alpha f(t)

with coefficient functions {\sigma_\alpha}. Using Fourier inversion we get

\displaystyle D^\alpha f(t) - (2\pi)^{-d/2} \int \hat f(\omega) (\mathrm{i} \omega)\mathrm{e}^{\mathrm{i}\omega t}d\omega

and hence

\displaystyle \begin{array}{rcl} Af(t) & =& \int \Big((2\pi)^{-d/2} \sum_{|\alpha|\leq N} \sigma_\alpha(t)(\mathrm{i} \omega)^\alpha) \hat f(\omega)\mathrm{e}^{\mathrm{i}\omega t}d\omega\\ &=& \int \sigma(\omega,t) \hat f(\omega)\mathrm{e}^{\mathrm{i}\omega t}d\omega \\ & = & K_\sigma f(t). \end{array}

For a general function (usually obeying some restrictions) {\sigma:{\mathbb R}^d\times{\mathbb R}^d\rightarrow{\mathbb C}} we call the corresponding {K_\sigma} a pseudo-differential operator with symbol {\sigma}.

2. As integral operators

By integral operator I mean something like

\displaystyle F_k f(t) = \int k(t,\tau)f(\tau)d\tau.

We plug the definition of the Fourier transform into {K_\sigma} and obtain

\displaystyle \begin{array}{rcl} K_\sigma f(t) &=& \int \sigma(\omega,t) (2\pi)^{-d/2} \int f(\tau)\mathrm{e}^{-\mathrm{i} \omega \tau}d\tau \mathrm{e}^{\mathrm{i} \omega t}d\omega\\ & = & \int \underbrace{(2\pi)^{-d/2} \int \sigma(\omega,t)\mathrm{e}^{-\mathrm{i}\omega(\tau-t)}d\omega}_{=k(t,\tau)} f(\tau)d\tau\\ &=& \int k(t,\tau)f(\tau)d\tau. \end{array}

Using the Fourier transform we can express the relation between {\sigma} and {k} as

\displaystyle k(t,\tau) = (2\pi)^{-d/2}\int \sigma(\omega,\tau)\mathrm{e}^{-\mathrm{i}\omega(\tau-t)}d\omega = \mathcal{F}_1(\sigma(\cdot,t))(\tau-t). \ \ \ \ \ (1)

3. As “time varying convolution”

The convolution of two functions {f} and {g} is defined as

\displaystyle (f*g)(t) = \int f(\tau) g(t-\tau)d\tau

and we write “the convolution with {g}” as an operator {C_g f = f * g}.


\displaystyle h_t(\tau) = (2\pi)^{-d/2}\int \sigma(\omega,t)\mathrm{e}^{\mathrm{i}\omega \tau}d\omega

we deduce from (1)

\displaystyle K_\sigma f(t) = \int f(\tau) h_t(t-\tau)d\tau= (f * h_t)(t) = C_{h_t}f(t).

4. As superposition of time-frequency shifts

Using that iterated Fourier transforms with respect to components give the Fourier transform, i.e. {\mathcal{F}_2\mathcal{F}_1 = \mathcal{F}}, we obtain

\displaystyle \mathcal{F}_1\sigma = \mathcal{F}_2^{-1}\mathcal{F}_2\mathcal{F}_1\sigma = \mathcal{F}_2^{-1}\hat\sigma.

From (1) we get

\displaystyle \begin{array}{rcl} k(t,\tau) &=& \mathcal{F}_2^{-1}\hat\sigma(\tau-t,t)\\ &=& (2\pi)^{-d/2} \int\hat\sigma(\tau-t,\nu)\mathrm{e}^{\mathrm{i} t\nu}d\nu. \end{array}

Before plugging this into {K_\sigma} we define time shifts {T_u} and frequency shifts (or modulations) {M_\nu} as

\displaystyle T_u f(t) = f(t-u),\quad M_\nu f(t) = \mathrm{e}^{\mathrm{i} t\nu}.

With this we get

\displaystyle \begin{array}{rcl} K_\sigma f(t) &=& (2\pi)^{-d/2}\int\int \hat\sigma(\tau-t,\nu)\mathrm{e}^{\mathrm{i} t\nu}f(\tau)d\nu d\tau\\ &=& (2\pi)^{-d/2}\int\int \hat\sigma(u,\nu)\mathrm{e}^{\mathrm{i} t\nu}f(t+u)d\nu du\\ &=& (2\pi)^{-d/2}\int\int\hat\sigma(u,\nu) M_\nu T_{-u} f(t) d\nu du\\ &=& \int\int w(u,\nu) M_\nu T_{-u} f(t)d\nu du = S_wf(t). \end{array}

Hence, {K_\sigma} is also a weighted superposition of {M_\nu T_{-u} f} (time-frequency shifts) with weight {(2\pi)^{-d/2} \hat\sigma(u,\nu)}.

5. Back to time varying channels

Simple substitution brings us back the situation of a time varying channel

\displaystyle \begin{array}{rcl} K_\sigma f(t) &=& (2\pi)^{-d/2} \int\int \hat\sigma(u,\nu)\mathrm{e}^{\mathrm{i} t\nu}f(t+u)d\nu du\\ &=& \int \int s(\tau,\nu) f(t-\tau) \mathrm{e}^{\mathrm{i} t\nu} d\nu d\tau\\ &=& H_s f(t) \end{array}

with {s(\tau,\nu) = (2\pi)^{-d/2} \hat\sigma(-\tau,\nu)}.

6. As superposition of product-convolution operators

Finally, I’d like to illustrate that this kind of operators can be seen as superposition of product convolution operators.

Introducing product operators {P_g f(t) = g(t)f(t)} we define a product-convolution operator as a convolution with a function {h} followed by a multiplication with another function {g}: {P_g C_h f}.

To express {K_\sigma} with product-convolution operators we choose an orthonormal basis which consists of tensor-products of functions {(\phi_n\otimes\psi_k)} and develop {\sigma} into this basis as

\displaystyle \sigma(\omega,t) = \sum_{n,k} a_{n,k} \phi_n(\omega)\psi_k(t).


\displaystyle \begin{array}{rcl} K_\sigma f(t) &=& \sum_{n,k} a_{n,k}\psi_k(t) \int \phi_n(\omega) \hat f(\omega) \mathrm{e}^{\mathrm{i} \omega t}d\omega\\ & = & \sum_{n,k} a_{n,k}\psi_k(t) (2\pi)^{d/2}\mathcal{F}^{-1}(\phi_n\, \hat f)(t)\\ &=& \sum_{n,k} a_{n,k}\psi_k(t) (\check \phi_n * f)(t) \end{array}

and we obtain

\displaystyle K_\sigma f(t) = \sum_{n,k} a_{n,k}P_{\psi_k} C_{\check\phi_n} f (t).

Remark 2 The integral operators of the form {F_k} are indeed general objects as can be seen from the Schwartz kernel theorem. Every reasonable operator mapping Schwartz functions linearly onto the tempered distributions is itself a generalized integral operator.

I tried to capture the various relation between the first five representations time varying channels in a diagram (where I went out of motivation before filling in all fields…):