November 29, 2011
November 28, 2011
I stumbled upon the notion of “time varying channels” in signal processing and after reading a bit I realized that these are really interesting objects and appear in many different parts of mathematics. In this post I collect of few of their realizations and relations.
In signal processing a “time varying channel” is a mapping which maps a signal to an output via
Before we explain the name “time varying channel” we fix the notation for the Fourier transform I am going to use in this post:
Definition 1 For we define the Fourier transform by
and denote the inverse Fourier transform by or . For a function we denote with and the Fourier transform with respect to the first and second -component, respectively.
Remark 1 In all what follows we do formal calculations with integral not caring about integrability. All calculation are justified in the case of Schwartz-functions and often hold in a much broader context of tempered distributions (this for example happens if the integrals represent Fourier transforms of functions).
The name “time varying channel” can be explained as follows: Consider a pure frequency as input: . A usual linear channel gives as output a damped signal and the damping depends on the frequency : . If we send the pure frequency in our time varying channel we get
Hence, the time varying channel also damps the pure frequencies but with a time dependent factor.
Let’s start quite far away from signal processing:
1. Pseudo-differential operators
A general linear differential operator of order on functions on is defined with multiindex notation as
with coefficient functions . Using Fourier inversion we get
For a general function (usually obeying some restrictions) we call the corresponding a pseudo-differential operator with symbol .
2. As integral operators
By integral operator I mean something like
We plug the definition of the Fourier transform into and obtain
Using the Fourier transform we can express the relation between and as
3. As “time varying convolution”
The convolution of two functions and is defined as
and we write “the convolution with ” as an operator .
we deduce from (1)
4. As superposition of time-frequency shifts
Using that iterated Fourier transforms with respect to components give the Fourier transform, i.e. , we obtain
From (1) we get
Before plugging this into we define time shifts and frequency shifts (or modulations) as
With this we get
Hence, is also a weighted superposition of (time-frequency shifts) with weight .
5. Back to time varying channels
Simple substitution brings us back the situation of a time varying channel
6. As superposition of product-convolution operators
Finally, I’d like to illustrate that this kind of operators can be seen as superposition of product convolution operators.
Introducing product operators we define a product-convolution operator as a convolution with a function followed by a multiplication with another function : .
To express with product-convolution operators we choose an orthonormal basis which consists of tensor-products of functions and develop into this basis as
and we obtain
Remark 2 The integral operators of the form are indeed general objects as can be seen from the Schwartz kernel theorem. Every reasonable operator mapping Schwartz functions linearly onto the tempered distributions is itself a generalized integral operator.
I tried to capture the various relation between the first five representations time varying channels in a diagram (where I went out of motivation before filling in all fields…):
November 24, 2011
If you want to have a sparse solution to a linear system of equation and have heard of compressed sensing or sparse reconstruction than you probably know what to do: Get one of the many solvers for Basis Pursuit and be happy.
Basis Pursuit was designed as a convex approximation of the generally intractable problem of finding the sparsest solution (that is, the solution with the smallest number of non-zero entries). By abuse of notation, we define for
(Because of some people prefer the, probably more correct but also more confusing, notation …).
Then, the sparsest solution of is the solution of
and Basis Pursuit replaces with “the closest convex proxy”, i.e.
The good thing about Basis Pursuit suit is, that is really gives the sparsest solution under appropriate conditions as is widely known nowadays. Here I’d like to present two simple examples in which the Basis Pursuit solution is
- not even close to the sparsest solution (by norm).
- not sparse at all.
1. A small bad matrix
We can build a bad matrix for Basis Pursuit, even in the case : For a small define
Of course, the sparsest solution is
while the solution of Basis Pursuit is
The summarize: For
(There is also a least squares solution that has three non-zero entries and a one-norm slightly larger than 2.)
Granted, this matrix is stupid. Especially, its first column has a very small norm compared to the others. Ok, let’s construct a matrix with normalized columns.
2. A small bad matrix with normalized columns
Fix an integer and a small . We define a -matrix
Ok, the first two columns do not have norm 1 yet, so we normalize them by multiplying with the right constant
(which is close to ) to get
Now we take the right hand side
and see what solutions to are there.
First, there is the least squares solution . This has only non-zero entries, the last entries are slightly smaller than and the first two are between and , hence, (in fact, slightly larger).
Second, there is a very sparse solution
This has two non-zero entries and a pretty large one-norm: .
Third there is a solution with small one-norm:
We have non-zero entries and . You can check that this is also the unique Basis Pursuit solution (e.g. by observing that is an element of and that the first two entries in are strictly smaller than 1 and positive – put differently, the vector is dual certificate for ).
To summarize, for it holds that
The geometric idea behind this matrix is as follows: We take simple normalized columns (the identity part in ) which sum up to the right hand side . Then we take two normalized vectors which are almost orthogonal to but have in their span (but one needs huge factors here to obtain ).
Well, this matrix looks very artificial and indeed it’s constructed for one special purpose: To show that minimal -norm solution are not always sparse (even when a sparse solution exists). It’s some kind of a hobby for me to construct instances for sparse reconstruction with extreme properties and I am thinking about a kind of “gallery” of these instances (probably extending the “gallery” command in Matlab).
By the way: if you want to play around with this matrix, here is the code
n = 100;
epsilon = sqrt(8/(n^2-n))+0.1;
c = 1/sqrt(2+n*epsilon^2/4);
A = zeros(n,n+2);
A(1:2,1:2) = ([1 -1;-1,1]+epsilon/2)*c;
A(3:n,1:2) = epsilon/2*c;
A(1:n,3:n+2) = eye(n);
b = ones(n,1);
November 22, 2011
Posted by Dirk under Math
| Tags: google scholar
| Leave a Comment
On 0xDE I learned that Google has a new serivce, called Google scholar citations. You can make a profile and collect all your publications that Google scholar can find. It also calculates several of these statistics about your citations like the h-index.
Using this one realizes a few things quickly:
- Google scholar is not even close to be a reliable data base for mathematical publications. I find talks, poster, preprints multiple times,…
- Google scholar is not even close to have a proper database of the “citation network”. Some preprint series (in which I have some preprints) include a list of all paper in that series at the end of each paper. Google scholar takes this as the list of reference and hence, produces wrong results here.
- Its hard to tell different people with the same name apart.
It seems that one purpose of Google scholar citation is to improve on all three parts by letting the people doing the work (what is not a bad idea). Note that similar services exist for several disciplines. For mathematics there is MathSciNet which is maintained by the Amarican Mathematical Society. They really put effort in their database with respect to all three point above. Moreover, there is the Zentralblatt.
While it is really nice to have a search engine for science which also finds preprints and not only published material, this service makes it even easier to produce stupid things like “impact statistics”, “productivity indices”. On the positive side it is really helpful to see what paper do cite another one (this makes it easier to find the source of some idea or original proofs). On the downside, this service may lead to the impression that research output (if there is such thing) can be measured by numbers in any way. I do not think that this is true. In the end: What does it say that one paper has been cited 50 times and another one five times? Think about it. It does not mean that the first paper has been read more, had spread more interest, had more impact or is even better in any way. It plainly says that the first paper has been cited ten times more. That’s it. There are papers which introduced great ideas in an awkward way and then a follow-up explained the same idea better. Hence, the second one gets cited although the first is equally important… There are weird paper which get cited a lot because everyone does. There are paper which many people did not read but cite because the have heard or read that this paper is the reference for something.
By the way: My profile is here.
November 11, 2011
If mathematicians study an object they sometimes study the properties of the object. E.g. you can study the object “2” (that is the natural number 2) by saying that it denotes a certain cardinality, name the cardinality of two things. Well, that does not give you anything new. However, mathematicians often study objects by studying what either the object can do to other things or what other things can do the objects. In our (quite silly) example of the number two you see that you can use 2 to add it to other numbers (or, conversely, you can add other numbers to 2). One may feel that this does not add any insight but for objects which are more complicated (e.g. like sets of other objects or sets of these) this view can be particularly useful: If you consider a normed vector space , you can ask either for properties of the space itself (e.g. completeness) or you ask how one can map the objects of the space to other sets. Using the underlying ingredients to give more structure on the mappings under consideration this gives new constructions: In the case of a normed vector space you have the underlying field . This leads to mappings from to . You also have a linear structure on . This leads to linear mappings from to . Finally, there is a norm on and if there is an absolute value on this leads you to bounded linear mappings from to . Any guess what: These objects form the so-called dual space.
Dual spaces of (say) Banach spaces are telling you new things about the space itself and it seems to me that if you want to understand the nature of a Banach space you need to know the dual space (or a space of which your space is the dual, i.e. a pre-dual space).
This post shall not be about duality in general but on dual spaces of a particular type of spaces, namely of spaces which consist of continuous functions. While these spaces usually form nice Banach spaces, they are not as simple as Lebesgue spaces or Sobolev spaces in that there are no Hilbert spaces among them and even no reflexive spaces.
1. What spaces of continuous functions?
For sets and one can study continuous functions as soon as both and are topological spaces. For these functions to form a vector space we also want that is a vector space over a field . To get a normed space of continuous functions we endow with a norm and try to define a norm of as
But wait: The supremum may not exist. We better add the assumption that is bounded. Now this gives us a normed space and we denote:
Together with the norm this gives a Banach space as soon as is a Banach space. In the case that is compact, we can omit the condition of boundedness and obtain
and again get a Banach space as soon as is a Banach space.
Instead of assuming compactness of the whole space we could shift this property to all functions in this space. This leads to
This does not lead to a Banach space in the case that is not compact. Indeed, there is a last possibility by considering the closure of with respect to leading to the space
Informally one says that this are the continuous function “that vanish on the boundary of ”.
All these spaces admit dual spaces and all these spaces are described by some kind of “Riesz representation theorem”. These dual spaces consists of some kind of measures and the dual pairing is then defined as integrating the continuous functions with respect to these measures. Here I want to clarify (at least for myself) what these dual spaces are and how they are related.
The case in which (i.e. the real or complex numbers) is a little easier and we omit the argument in the spaces in this case: , ,…
2. The dual of for normal
We deal with real or complex valued functions on a set and assume that in a normal topological space (also called -space), that is the points in are all closed and two disjoint closed sets can be separated by open neighborhoods. Of course a metric space is a normal topological space.
2.1. Regular bounded finitely additive measures
To see what the dual space of could be, we have to think of integrating a bounded continuous functions against a measure . Here the topological structure of comes into play because it naturally leads to a -algebra on , namely the one which is generated by the closed sets in (or equivalently be the open sets) which is also called the Borel algebra on . On a -algebra we can define a set function (which shall be a measure) by mapping the elements in to real (or complex) numbers. Such a set function is called bounded if for all and finitely additive if for any finite number of disjoint set it holds that
The set of all bounded and finitely additive set functions (nowadays often called bounded finitely additive measures) is denoted by . Equipped with the variational norm , becomes a Banach space. The space is a fairly large space of measures: Note that we did neither assume that the member are countably additive nor that the values shall be non-negative. However, we assumed the all values are finite which excludes, e.g. the Lebesgue-measure on (while weighted Lebesgue measures can be allowed if the weight is integrable). The space is slightly too large to be the dual of and we need another restriction. We call an element regular, if for any and any there exists a closed set and an open set such that for every it holds that . The space of regular bounded finitely additive measures is denoted by and is closed subspace of
2.2. Intgration with respect to
Now we can define the integral of a bounded continuous function with respect to a regular bounded finitely additive measure as follows: Since is a bounded set in we can cover is with open sets with diameter less than a given . Define and . If is not empty, choose (otherwise choose ). Since is continuous, the sets are in and we can define
which is an -approximation to and indeed, is the uniform limit of the . For each , the integral is naturally defined as
and the limit of this expression is used as integral for .
Since it holds that
every defines a continuous linear functional an . This shows that the dual space of contains . Indeed both spaces are equal:
Theorem 1 For a normal topological space it holds that
The proof is lengthy and technical and the prime reference is “Linear Operators” by Dunford and Schwartz (Theorem IV.6.2).
3. The dual of for compact Hausdorff
Now we assume that is a compact Hausdorff space (think of closed and bounded subsets of ). Due to compactness we can omit the boundedness assumption on our continuous functions (they are bounded anyway).
3.1. Regular bounded countably additive measures
We move from finitely additive measures to countably additive measure that is, (1) holds for countably many disjoint sets . Together with finiteness and regularity we arrive at the space of regular bounded countably additive measures. In our case, where is also a topological space and the -algebra is the Borel algebra this space is also called the space of regular Borel measures} or Radon measure} and often denoted by .
Here we have the following representation theorem:
Theorem 2 If is compact and Hausdorff it holds that
every determines an element in . Moreover, it is clear that . It remains to show that every determines a such that for all it holds that .
4. The dual of for locally compact Hausdorff
This case is basically the same as for with compact Hausdorff . The absence of compactness is compensated by the fact that the function “vanish on the boundary”. Well, “vanishing on the boundary” really means, that the function can be approximated by continuous function with compact support (in the -norm) and hence, the result do not differ from the previous one:
Theorem 3 If is locally compact and Hausdorff it holds that
- It turned out that the situation is easier than I expected. Basically there are two cases: Bounded continuous functions on a normal set . Here we get the regular bounded and finitely additive measures as the dual space. The case are continuous functions on a compact space or continuous function which “vanish at the boundary” on a locally compact space. In both we arrive at cases regular bounded countably additive measure aka Radon measures.
November 2, 2011
How many samples are needed to reconstruct a sparse signal?
Well, there are many, many results around some of which you probably know (at least if you are following this blog or this one). Today I write about a neat result which I found quite some time ago on reconstruction of nonnegative sparse signals from a semi-continuous perspective.
1. From discrete sparse reconstruction/compressed sensing to semi-continuous
The basic sparse reconstruction problem asks the following: Say we have a vector which only has non-zero entries and a fat matrix (i.e. ) and consider that we are given measurements . Of course, the system is underdetermined. However, we may add a little more prior knowledge on the solution and ask: Is is possible to reconstruct from if we know that the vector is sparse? If yes: How? Under what conditions on , , and ? This question created the expanding universe of compressed sensing recently (and this universe is expanding so fast that for sure there has to be some dark energy in it). As a matter of fact, a powerful method to obtain sparse solutions to underdetermined systems is -minimization a.k.a. Basis Pursuit on which I blogged recently: Solve
and the important ingredient here is the -norm of the vector in the objective function.
In this post I’ll formulate semi-continuous sparse reconstruction. We move from an -vector to a finite signed measure on a closed interval (which we assume to be for simplicty). We may embed the -vectors into the space of finite signed measures by choosing points , from the interval and build with the point-masses (or Dirac measures) . To a be a bit more precise, we speak about the space of Radon measures on , which are defined on the Borel -algebra of and are finite. Radon measures are not very scary objects and an intuitive way to think of them is to use Riesz representation: Every Radon measure arises as a continuous linear functional on a space of continuous functions, namely the space which is the closure of the continuous functions with compact support in with respect to the supremum norm. Hence, Radon measures work on these functions as . It is also natural to speak of the support of a Radon measure and it holds for any continuous function that
An important tool for Radon measures is the Hahn-Jordan decomposition which decomposes into a positive part and a negative part , i.e. and are non-negative and . Finally the variation of a measure, which is
provides a norm on the space of Radon measures.
Example 1 For the measure one readily calculates that
In this sense, the space of Radon measures provides a generalization of .
We may sample a Radon measure with linear functionals and these can be encoded by continuous functions as
This sampling gives a bounded linear operator . The generalization of Basis Pursuit is then given by
This was introduced and called “Support Pursuit” in the preprint Exact Reconstruction using Support Pursuit by Yohann de Castro and Frabrice Gamboa.
More on the motivation and the use of Radon measures for sparsity can be found in Inverse problems in spaces of measures by Kristian Bredies and Hanna Pikkarainen.
2. Exact reconstruction of sparse nonnegative Radon measures
Before I talk about the results we may count the degrees of freedom a sparse Radon measure has: If with some than is defined by the weights and the positions . Hence, we expect that at least linear measurements should be necessary to reconstruct . Surprisingly, this is almost enough if we know that the measure is nonnegative! We only need one more measurement, that is and moreover, we can take fairly simple measurements, namely the monomials: (with the convention that ). This is shown in the following theorem by de Castro and Gamboa.
Theorem 1 Let with , and let , be the monomials as above. Define . Then is the unique solution of the support pursuit problem, that is of
Proof: The following polynomial will be of importance: For a constant define
The following properties of will be used:
- has degree and hence, is a linear combination of the , , i.e. .
- For small enough it holds for that .
Now let be a solution of (SP). We have to show that . Due to property 2 we know that
Due to property 1 and non-negativity of we conclude that
Moreover, by Lebesgue’s decomposition we can decompose with respect to such that
and is singular with respect to . We get
and we conclude that and especially . This shows that is a solution to . It remains to show uniqueness. We show the following: If there is a with support in such that , then . To see this, we build, for any , the sets
and assume that there exists such that ( denoting the restriction of to ). However, it holds by property 3 of that
which is a contradiction. Hence, for all and this implies . Since has its support in we conclude that . Hence the support of is exactly . and since and hence . This can be written as a Vandermonde system
which only has the zero solution, giving .
3. Generalization to other measurements
The measurement by monomials may sound a bit unusual. However, de Castro and Gamboa show more. What really matters here is that the monomials for a so-called Chebyshev-System (or Tchebyscheff-system or T-system – by the way, have you ever tried to google for a T-system?). This is explained, for example in the book “Tchebycheff Systems: With Applications in Analysis and Statistics” by Karlin and Studden. A T-system on is simply a set of functions such that any linear combination of these functions has at most zeros. These systems are called after Tchebyscheff since they obey many of the helpful properties of the Tchebyscheff-polynomials.
What is helpful in our context is the following theorem of Krein:
Theorem 2 (Krein) If is a T-system for , and are in the interior of , then there exists a linear combination which is non-negative and vanishes exactly the the point .
Now consider that we replace the monomials in Theorem~1 by a T-system. You recognize that Krein’s Theorem allows to construct a “generalized polynomial” which fulfills the same requirements than the polynomial is the proof of Theorem~1 as soon as the constant function 1 lies in the span of the T-system and indeed the result of Theorem~1 is also valid in that case.
4. Exact reconstruction of -sparse nonnegative vectors from measurements
From the above one can deduce a reconstruction result for -sparse vectors and I quote Theorem 2.4 from Exact Reconstruction using Support Pursuit:
Theorem 3 Let , , be integers such that and let be a complete T-system on (that is, is a T-system on for all ). Then it holds: For any distinct reals and defined as
Basis Pursuit recovers all nonnegative -sparse vectors .
5. Concluding remarks
Note that Theorem~3 gives a deterministic construction of a measurement matrix.
Also note, that nonnegativity is crucial in what we did here. This allowed (in the monomial case) to work with squares and obtain the polynomial in the proof of Theorem~1 (which is also called “dual certificate” in this context). This raises the question how this method can be adapted to all sparse signals. One needs (in the monomial case) a polynomial which is bounded by 1 but matches the signs of the measure on its support. While this can be done (I think) for polynomials it seems difficult to obtain a generalization of Krein’s Theorem to this case…