Math


Image

I have an open position for a Scientific Assistant/PhD student available. The salary is according to TV-L EG 13. (Don’t know what that means? Have a look here.). The position starts at 01.09.2013 (earlier start is possible) and is initially limited to two years; further extension is possible and for a PhD student, at least three years are planned.

Candidates should

  • have a degree (Masters or Diploma) in mathematics above average,
  • have very good knowledge in numerical mathematics and functional analysis,
  • have good knowledge in scientific computing or optimization,
  • know German well and
  • have strong interest in applied mathematics. Also, bring a high commitment for scientific research.

The responsibilities include

  • participation in teaching and
  • independent but supervised research in field of applied mathematics (especially mathematical imaging and inverse problems).

Please send applications including CV, copies of certificates and letters of recommendation (if any) in electronic form directly to me. Deadline is the 30.06.2013.

If you would like to post the job advertisement at you bulletin board, here’s the pdf file.

This term I am particularly busy as I am writing a book about variational methods in imaging. The book will be a textbook and I am writing it parallel to a lecture I am currently teaching. (And it is planned that Kristian Bredies, the co-author, teaches the same stuff next term – then there will be another few month of editing, so it will be at least a year until publishing.)

In the book we will treat variational approaches to a variety of basic imaging problems. Of course we treat denoising and deblurring but there will also be sections about image interpolation, segmentation and optical flow. In the first part of the book, we present the variational problem and model them properly in Lebesgue and Sobolev spaces and of course in the space {BV}. Some effort goes into the analysis of the models and the first step is usually to establish existence of solutions, i.e. minimizers of the respective minimization problems. The work horse is the direct method in the calculus of variations and we mainly use the method for convex functionals in Banach spaces.

When I started the section on optical flow I noticed that I hadn’t thought about existence of minimizers before and moreover, most papers and books do not treat this issue. Let’s recall the method of Horn and Schunck to calculate the optical flow:

For two images {u_0}, {u_1} defined on a domain {\Omega\subset{\mathbb R}^2} one seeks a flow field {V:\Omega\rightarrow{\mathbb R}^2} such that {V} the describes the apparent motion that has happened between both images. Assuming that the points keep their gray value during motion (an assumption known as the brightness constancy constraint) and linearizing this assumption one arrives at the condition

\displaystyle  \frac{u_1-u_0}{dt} + V\cdot\nabla u_0 = 0

(where {dt} is the time between the images {u_0} and {u_1}). First, this does not give enough equations to determine {V} and secondly, points with {\nabla u_0=0} are problematic.

Horn and Schunck proposed to loose the constraint and to enforce some smoothness of the flow field {V}: Their model was to minimize

\displaystyle  F(u) = \int_\Omega\big(\frac{u_1-u_0}{dt} + V\cdot\nabla u_0\big)^2{\mathrm d}{x} + \lambda\int_\Omega|\nabla V|^2{\mathrm d}{x}

for some parameter {\lambda} weighting smoothness of {V} (large {\lambda}) against the brightness constancy constraint (small {\lambda}). A little bit more general one could choose exponents {p} and {q} and minimize

\displaystyle  F(u) = \int_\Omega\big|\frac{u_1-u_0}{dt} + V\cdot\nabla u_0\big|^q{\mathrm d}{x} + \lambda\int_\Omega|\nabla V|^p{\mathrm d}{x}.

To apply the direct method to obtain existence of minimizers of {F} one ensures

  1. properness, i.e. there is some {V} such that {F} is finite,
  2. convexity of {F},
  3. lower semi-continuiuty of {F} and
  4. coercivity of {F}.

To check these things one has to choose an appropriate space to work in. It seems reasonable to choose {V\in L^{q}(\Omega,{\mathbb R}^2)}. Then properness of {F} is easy (consider {V=0}, of course assuming that {u_1-u_0\in L^q(\Omega)}). Convexity is also clear and for lower semi-continuity one has to work a little more, but that is possible if, e.g., {\nabla u_0} is bounded. Coercivity is not that clear and in fact {F} is not coercive in general.

Example 1 (Non-coercivity of the Horn-and-Schunck-model) Simply consider {u_0(x,y) = ax + by} for some {a,b\in{\mathbb R}}. Then {\nabla u(x,y) \equiv [a\ b]^T}. Set {V_n(x,y) \equiv [-nb\ na]^T} and note that {\|V^n\|_q\rightarrow\infty} while {F(V^n)} stays bounded (in fact constant).

I just checked the book “Mathematical problems in Imaging” by Gilles Aubert and Pierre Kornprobst and in Section 5.3.2 they mention that the Horn and Schunck model is not coercive. They add another term to {F} which is roughly a weighted norm of {V} which ensures coercivity. However, it turns out that coercivity of {F} is true under a mild assumption of {u_0}. The idea can be found in a pretty old paper by Christoph Schnörr which is called “ Determining Optical Flow for Irregular Domains by Minimizing Quadratic Functionals of a Certain Class” (Int. J. of Comp. Vision, 6(1):25–38, 1991). His argument works for {q=2}:

Theorem 1 Let {\Omega\subset{\mathbb R}^2} be a bounded Lipschitz domain, {u_0,u_1\in L^2(\Omega)} with {\nabla u_0\in L^\infty(\Omega)} such that {\partial_x u_0} and {\partial_y u_0} are linearly independent in {L^2(\Omega)} and let {1<p<\infty}. Then it holds that {F:L^2(\Omega)\rightarrow {\mathbb R}\cup\{\infty\}} defined by

\displaystyle  F(u) = \int_\Omega\big(\frac{u_1-u_0}{dt} + V\cdot\nabla u_0\big)^2{\mathrm d}{x} + \lambda\int_\Omega|\nabla V|^2{\mathrm d}{x}

is coercive.

Proof: Now consider {V^n} such that {\|V^n\|_2\rightarrow\infty}. Now we decompose the components of {V} into the constant parts {QV^n_x} and {QV^n_y} and the “zero-mean”-part {PV^n_x = V^n_x - QV^n_x} and {PV^n_y = V^n_y - QV^n_y}. First consider that {PV^n} is unbounded, i.e. there is subsequence (also denoted by {V^n}) such that {\|PV^n\|_2\rightarrow\infty}. By Sobolev embedding and the \href{http://en.wikipedia.org/wiki/Poincar inequality}, we get that {\int_\Omega|\nabla V^n|^p{\mathrm d}{x}\rightarrow\infty}.

Now consider bounded {PV^n} and hence, unbounded mean values {QV^n}. Using a subsequence, we assume that {QV^n\rightarrow\infty}. Now we use

\displaystyle   \Big\|\frac{u_1 - u_0}{\Delta t} + V\cdot \nabla u_0\Big\|_2 \geq \Big\|QV\cdot\nabla u_0\Big\|_2 - \Big\|\frac{u_1 - u_0}{\Delta t} + PV\cdot \nabla u_0\Big\|_2 \ \ \ \ \ (1)

and estimate the first term from below, noticing that {QV_x} and {QV_y} are constants, by

\displaystyle  \begin{array}{rcl}  \|QV\cdot\nabla u_0\|_2^2 & = &\|QV_x\,\partial_x u_0 + QV_y\,\partial_y u_0\|_2^2\\ & = & \|QV_x\,\partial_x u_0\|_2^2 + \|QV_y\,\partial_y u_0\|_2^2 + 2\langle QV_x\,\partial_x u_0,QV_y\,\partial_y u_0\rangle\\ & \geq &|QV_x|^2\|\partial_x u_0\|_2^2 + |QV_y|^2\|\partial_y u_0\|_2^2\\ &&\qquad - \|QV_x\,\partial_xu_0\|_2\|QV_y\,\partial_yu_0\|_2\,2\frac{|\langle \partial_x u_0,\partial_y u_0\rangle|}{\|\partial_xu_0\|_2\|\partial_y u_0\|_2}\\ & \geq &(|QV_x|^2\|\partial_x u_0\|_2^2 + |QV_y|^2\|\partial_y u_0\|_2^2) \Big(1 - \frac{|\langle \partial_x u_0,\partial_y u_0\rangle|}{\|\partial_xu_0\|_2\|\partial_y u_0\|_2}\Big). \end{array}

Since {\partial_x u_0} and {\partial_y u_0} are linearly independent, it holds that {1 - \frac{|\langle \partial_x u_0,\partial_y u_0\rangle|}{\|\partial_xu_0\|_2\|\partial_y u_0\|_2}>0} and we conclude that {\|QV^{n_k}\|_2\rightarrow\infty} implies that {\|QV^{n_k}\cdot\nabla u_0\|_2^2\rightarrow\infty}. Together with~(1) and boundedness of {PV^{n_k}} we obtain that {F(V^{n_k})\rightarrow\infty}. Since for every subsequence of {V^n} we get another subsequence {V^{n_k}} such that {F(V^{n_k})\rightarrow\infty}, the same conclusion holds for the whole sequence, showing coercivity of {F}. \Box

Basically the same arguments works for {TV} optical flow, i.e. coercivity of

\displaystyle  F(u) = \int_\Omega\big(\frac{u_1-u_0}{dt} + V\cdot\nabla u_0\big)^2{\mathrm d}{x} + \lambda TV(V).

However, I do not know yet what happens for {q\neq 2} and if the result on coercivity is “sharp” in the sense that linear independence of {\partial_x u_0} and {\partial_y u_0} is necessary. Also, I don’t know yet what is true in dimensions higher than {2}.

If you are working on optimization with partial differential equations as constraints, you may be interested in the website

“OPTPDE – A Collection of Problems in PDE-Constrained Optimization”, http://www.optpde.net.

If you have developed an algorithm which can handle a certain class of optimization problems you need to do evaluations and tests on how well the method performs. To do so, you need well constructed test problems. This could be either problems where the optimal solution is known analytically our problems where the solution is known with a rigorous error bound obtained with a bullet-proof solver. Both things are not always easy to obtain and OPTPDE shall serve as a resource for such problems. It has been designed by Roland Herzog, Arnd Rösch, Stefan Ulbrich and Winnifried Wollner.

The generation of test instance for optimization problems seems quite important to me and indeed, several things can go wrong if this is not done right. Frequently, one sees tests for optimization routines on problems where the optimal solution is not known. Since there are usually different ways to express optimality conditions it is not always clear how to check for optimality; even more so, if you only check for “approximate optimality”, e.g. up to machine precision. A frequently observed effect is a kind of “trusted method bias”. By this I mean that an optimal solution is calculated by some trusted method and comparing the outcome of the tested routine with this solution. However, the trusted method uses some stopping criterion usually based on some specific set of formulations of optimality conditions and these can be different from what the new method has been tuned to. And most often, the stopping criteria do not give a rigorous error bound for the solution or the optimal objective value.

For sparse reconstruction problems, I dealt with this issue in “Constructing test instance for Basis Pursuit Denoising” (preprint available here) but I think this methodology could be used for other settings as well.

Here I continue my previous post on methods to compare shapes. In that post we started with different metrics between two probability measures {\mu_1} and {\mu_2} defined on the same set {X}, namely the Prokhorov metric and the Wasserstein metric. Then we also had a look on the Hausdorff metric which measures the distance between two compact subsets {A} and {B} of a metric space {(X,d)} and finally considered the Gromov-Hausdorff metric between two metric spaces {(X,d_X)} and {(Y,d_Y)}. Our tools have been

  • set couplings of {X} and {Y}, i.e. subsets {R\subset X\times Y} such that for all {x\in X} there {y\in Y} such that {(x,y)\in R} and for all {y\in Y} there is {x\in X} such that {(x,y)\in R},
  • metric couplings of {d_X} and {d_Y}, i.e. metrics {d} on the disjoint union of {X} and {Y} such that {d(x,x') = d_X(x,x')} if {x} and {x'} are in {X} and {d(y,y') = d_Y(y,y')} if {y} and {y'} are in {Y}. In fact, one could also work with semi-metrics {d}, i.e. they do not need to be positive definite, and
  • measure couplings of {\mu_X} and {\mu_Y}, i.e. measures {\nu} on {X\times Y} such that {\nu(A\times Y) = \mu_X(A)} and {\nu(X\times B) = \mu_Y(B)} for all {\mu_X}-/{\mu_Y}-measurable set {A} and {B}, respectively.

Now we make the next (and final) step and compare metric spaces {(X,d_X)} and {(Y,d_Y)} which are both equipped with a measure. These objects are known as metric measure spaces or mm-spaces and are formally defined as follows:

Definition 1 (mm-space) A metric measure space is a tripel {(X,d_X,\mu_X)} consisting a compact metric space {(X,d_X)} and a Borel probability measure {\mu_X} on {X}.

Note that sometimes it is included that {\mu_X} has full support (i.e., equal to {X}) in the definition of an mm-space, but it seems that not everybody does it like that.

1. Comparing mm-spaces: Gromov-Wasserstein

Our question is: How to we compare two mm-spaces {(X,d_X,\mu_X)} and {(Y,d_Y,\mu_Y)}? The plan is simply to augment the previous versions of the Gromov-Hausdorff distances defined in my previos post here and here by something which takes the measures on the respective metric spaces into account. We recall both formulations of the Gromov-Hausdorff distance: The first is

\displaystyle   d_{GH}(X,Y) = \inf_{R,d} \sup_{(x,y)\in R} d(x,y) \ \ \ \ \ (1)

where the infimum is taken over all set couplings {R} of {X} and {Y} and metric couplings {d} of {d_X} and {d_Y}, and the second is

\displaystyle   d_{GH}(X,Y) = \tfrac12\inf_R \sup_{\overset{\overset{x_{1/2}\in X}{y_{1/2}\in Y}}{(x_i,y_i)\in R}}\big| d_X(x_1,x_2) - d_Y(y_1,y_2)\big| \ \ \ \ \ (2)

where the infimum is also taken over all set couplings {R} of {X} and {Y}.

Basically, we have already seen what we should do to build a metric between mm-spaces. The idea is: if there were some “natural measure” {\mu_R} for any set coupling {R} of {X} and {Y}, then we would simply define the distance between {(X,d_X,\mu_X)} and {(Y,d_Y,\mu_Y)} as

\displaystyle  d_{GH}(X,Y) = \inf_{R,d} \Big(\int d(x,y)^pd\mu_R\Big)^{1/p}

(as a generalization of (1)) and

\displaystyle  d_{GH}(X,Y) = \tfrac12\inf_R \Big(\int_{R\times R}\big| d_X(x_1,x_2) - d_Y(y_1,y_2)\big|^p d\mu_R(x_1,y_1)d\mu_R(x_2,y_2)\Big)^{1/p}

(as a generalization of (2)). In both cases we can have {1\leq p<\infty}. Note that the obvious modification for {p=\infty} leads to something very similar to the Gromov-Hausdorff metrics.

But indeed there are “natural measures” on the set couplings of {X} and {Y}: At least for the full coupling {R= X\times Y} there are the measure couplings {\nu} of {\mu_X} and {\mu_Y}! (One does not need to consider smaller set couplings {R\subsetneq X\times Y} since this can be taken into account by the measure couplings; they do not need to have full support anyway.)

Applying this idea to the version (1) of the Gromov-Hausdorff metric we arrive at the following expression, which can be called Gromov-Wasserstein metric,

\displaystyle   d_{Gw}^1(X,Y) = \inf_{\nu,d} \Big(\int_{X\times Y} d(x,y)^pd\nu(x,y)\Big)^{1/p} \ \ \ \ \ (3)

where the infimum is taken over all measure couplings {\nu} of {\mu_X} and {\mu_Y} and all metric couplings {d} of {d_X} and {d_Y}.

Starting from the version (2) of the Gromov-Hausdorff metric we arrive at another formulation:

\displaystyle   d_{GW}^2(X,Y) = \tfrac12\inf_\nu \Big(\int_{X\times Y}\int_{X\times Y}\big| d_X(x_1,x_2) - d_Y(y_1,y_2)\big|^p d\nu(x_1,y_1)d\nu(x_2,y_2)\Big)^{1/p} \ \ \ \ \ (4)

where the infimum is taken over all measure couplings {\nu} of {\mu_X} and {\mu_Y}.

While both versions of the Gromov-Hausdorff metric for compact metric spaces where equal, the same is not true for both generalizations to mm-spaces: In his paper Memoli proves that {d_{GW}^2\leq d_{GW}^1} and gives an example (right after Remark 5.14) where strict inequality holds.

2. Comparing mm-spaces: Gromov-Prokhorov

Instead of starting from the Gromov-Hausdorff distance between metric spaces and augmenting their definition with something that takes the measures into account, we could also start from the “>Prokhorov metric between probability measures and augment the definition with something that takes the metric into account. In fact there are also two possibilities to do so: In the appendix of his paper, Memoli quotes this version (from this paper by Greven, Pfaffelhuber and Winter) of a metric between mm-spaces which we call Gromov-Prokhorov metric

\displaystyle   d_{GP}^1(X,Y) = \inf_{\nu,d}\Big\{\epsilon>0\ :\ \nu\{(x,y)\ :\ d(x,y)\geq\epsilon\}\leq\epsilon\Big\} \ \ \ \ \ (5)

where the infimum is taken over all measure couplings {\nu} of {\mu_X} and {\mu_Y} and all metric couplings {d} of {d_X} and {d_Y}.

The next version (also from the same paper by Greven, Pfaffelhuber and Winter where it was called Eurandom metric) is

\displaystyle   d_{GP}^2(X,Y) = \inf_\nu\Big\{\epsilon>0\ :\ \nu\otimes\nu\{(x_1,y_1,x_2,y_2)\ :\ |d_X(x_1,x_2) - d_Y(y_1,y_2)|\geq\epsilon\}\leq\epsilon\Big\} \ \ \ \ \ (6)

where the infimum is taken over all measure couplings {\nu} only.

3. A very simple example

The calculation of the proposed metrics by hand can be quite cumbersome. Let’s look at the simplest example.

We consider metric spaces {X,Y\subset{\mathbb R}^d} (with the euclidean metric) accompanied with the measures {\mu_X=\delta_{x_0}} and {\mu_Y = \delta_{y_0}} for some points {x_0\in X} and {y_0\in Y}. In this case the is only one measure coupling of {\mu_X} and {\mu_Y}, namely

\displaystyle  \nu = \delta_{(x_0,y_0)}.

Now it is easy to calculate the variant (4) of the Gromov-Wasserstein metric:

\displaystyle  \begin{array}{rcl}  d_{GW}^2(X,Y) &=& \tfrac12\Big(\int_{X\times Y}\int_{X\times Y}\big| |x_1-x_2| - |y_1 -y_2|\big|^p d\delta_{(x_0,y_0)}(x_1,y_1)d\delta_{(x_0,y_0)}\Big)^{1/p} \\ &=& \tfrac12\Big(\int_{X\times Y}\big| |x_0-x_2| - |y_0 -y_2|\big|^pd\delta_{(x_0,y_0)}\Big)^{1/p} \\ &=& 0. \end{array}

Let’s have a look at the variant (3): Since there is only one measure coupling, the metric is

\displaystyle  \begin{array}{rcl}  d_{GW}^1(X,Y) & = &\inf_d\Big(\int_{X\times Y} d(x,y)^pd\delta_{(x_0,y_0)}(x,y)\Big)^{1/p} \\ & = &\inf_d d(x_0,y_0). \end{array}

As we have learned in Example 4 in the previous post, we can find a metric coupling of {X} and {Y} that brings the points {x_0} in {X} and {y_0} in {Y} arbitrarily close together (by embedding both {X} and {Y} into some {{\mathbb R}^n} such that these points are only {\epsilon}-far away from each other). Hence, we see that we have

\displaystyle  d_{GW}^1(X,Y) = 0

similarly to {d_{GW}^2}.

Now let’s look at the Gromov-Prokhorov metric from (5). Again we only have one measure coupling and we get

\displaystyle  d_{GP}^1(X,Y) = \inf\Big\{\epsilon>0\ :\ \exists d\ \text{s.t}\ \delta_{(x_0,y_0)}\{(x,y)\ :\ d(x,y)\geq\epsilon\}\leq\epsilon\Big\}.

Since the measure coupling is a Dirac, we can evaluate

\displaystyle  \delta_{(x_0,y_0)}\{(x,y)\ :\ d(x,y)\geq\epsilon\} = \begin{cases} 1 & d(x_0,y_0)\geq \epsilon\\ 0 & d(x_0,y_0)<\epsilon. \end{cases}

As observed previously, there are metric couplings which bring the points {x_0} and {y_0} arbitrarily close together, and hence for any {\epsilon>0} there is a metric coupling such that {\delta_{(x_0,y_0)}\{(x,y)\ :\ d(x,y)\geq\epsilon\} = 0} which shows that

\displaystyle  d_{GP}^1(X,Y) = 0.

Finally, consider the second variant of the Gromov-Prokhorov metric from (6). Since we only have the one measure coupling {\nu = \delta_{(x_0,y_0)}} we have

\displaystyle  d_{GP}^2(X,Y) = \inf\Big\{\epsilon>0\ :\ \delta_{(x_0,y_0)}\otimes\delta_{(x_0,y_0)}\{(x_1,y_1,x_2,y_2)\ :\ \big| |x_1-x_2| - |y_1-y_2|\big|\geq\epsilon\}<\epsilon\Big\}.

Evaluating the tensored Dirac delta is easy: We have that {\delta_{(x_0,y_0)}\otimes\delta_{(x_0,y_0)}\{(x_1,y_1,x_2,y_2)\ :\ \big| |x_1-x_2| - |y_1-y_2|\big|\geq\epsilon\}} is either one or zero and it is one if and only if the point {(x_0,y_0,x_0,y_0)} is in the set that is measured. However, we have that {\big| |x_0-x_0| - |y_0-y_0|\big|} is, of course, always zero (and never larger than any {\epsilon >0}). Hence, the measure is always zero and hence, this version of the Gromov-Prokhorov distance also gives

\displaystyle  d_{GP}^2(X,Y) = 0.

Note that all four metrics can not see that the two Diracs are at different points. The reason seems to be, that one can “deform the metric space outside of the support of the measures” arbitrarily, in some sense.

It seems, that the computation of {d_{GW}^2} and {d_{GP}^2} were easier, since one needs to know all measure couplings and no metric couplings has been involved.

4. Next difficult example: Compare some mm-space to a point

Ok, we have seen that mm-spaces which carry their measure in just a single point all look alike in both versions of the Gromov-Wasserstein metric and also in both versions of the Gromov-Prokhorov metric. Let’s look at a slightly more difficult example: We consider some mm-space {(X,d_X,\mu)} and want to calculate its distance to a single point, i.e. the mm-space {Y=\{0\}} with the only possible metric and measure. This should somehow measure the “size” or “spread” of the mm-space {X}.

First, we need to know all measure couplings and all metric couplings between these spaces. The measure couplings are very easy: There is just one, namely

\displaystyle  \nu = \mu\otimes \delta

(i.e. all subsets of {X\times \{0\}} are treated as if they were subsets of {X}). Concerning metric couplings, there are a few more. We allow semi-metrics {d} on the disjoint union {X\sqcup\{0\}}: Since {d} should respect the metric {d_X} on {X} we see that all metric couplings are parametrized by the points {x_0} in {X} by identifying {0} (the element in {Y}) with this point {x_0}, i.e. all metric couplings are of the form {d_{x_0}} defined by

\displaystyle  d_{x_0}(x,y) = d_X(x,y)\ (\text{for}\ x,y\in X),\qquad d_{x_0}(x,0) = d_X(x,x_0).

(This only gives semi-metrics since we have {d_{x_0}(x_0,0)=0} although, formally {x_0\not 0}.)

Let’s calculate the first Gromov-Wasserstein metric: There is only one measure coupling and we use the parametrization of the metric couplings to deduce

\displaystyle  \begin{array}{rcl}  d_{GW}^1(X,\{0\}) &=& \inf_d\Big(\int_{X\times\{0\}} d(x,0)^pd(\mu\otimes \delta)(x,y)\Big)^{1/p}\\ &=& \inf_{x_0\in X}\Big(\int_{X} d_X(x,x_0)^pd\mu\Big)^{1/p}. \end{array}

This quantity seems to be known as “minimal {p}-th central moment” of {(X,d_X,\mu)}.

The second variant of the Gromov-Wasserstein metric is (remember, there is only one measure coupling)

\displaystyle  \begin{array}{rcl}  d_{GW}^2(X,\{0\}) &=& \tfrac12\Big(\int_{X\times\{0\}}\int_{X\times\{0\}} |d_X(x_1,x_2) - d_Y(0,0)|^p d(\mu\times\delta)(x_1,y_1) d(\mu\times\delta)(x_2,y_2) \Big)^{1/p}\\ &=& \tfrac12\Big(\int_X\int_X d_X(x_1,x_2)^p d\mu d\mu\Big)^{1/p}. \end{array}

This quantity (without the factor {1/2}) is called the “{p}-diameter” of {(X,d_X)}.

Let’s turn to the Gromov-Prokhorov metrics. The first one is (remember, that the metric couplings {d} are parametrized by the points in {X})

\displaystyle  \begin{array}{rcl}  d_{GP}^1(X,\{0\}) &=& \inf_{d}\Big\{\epsilon>0\ :\ (\mu\otimes\delta)\{(x,0)\ :\ d(x,0)\geq\epsilon\}\leq\epsilon\Big\}\\ &=& \inf_{x_0\in X}\Big\{\epsilon>0\ :\ \mu\{x\ :\ d_X(x,x_0)\geq\epsilon\}\leq\epsilon\Big\}. \end{array}

If this looks familiar, then you may have encountered the Ky-Fan metric already? The Ky-Fan metric is a metric between random variables {\xi_1} and {\xi_2} defined of the same probability space {(X,\mu)} with values in a metric space with metric {d}. It reads as

\displaystyle  d_{KF}(\xi_1,\xi_2) = \inf\Big\{\epsilon>0\ :\ \mu\{\omega\ :\ d(\xi_1(\omega),\xi_2(\omega))\geq\epsilon\}\leq\epsilon\Big\}.

Hence, the first version of the Gromov-Prokhorov metric is

\displaystyle  d_{GP}^1(X,\{0\}) = \inf_{x_0\in X}\ d_{KF}(\mathrm{id},x_0),

i.e., the minimal Ky-Fan metric between the identity mapping and the constant mappings. (In other words, it measures how far the identity is from the constant mappings in the sense of Ky Fan.)

The second variant of the Gromov-Prokhorov metric is (remember, the only measure coupling is {\nu = \mu\times \delta})

\displaystyle  \begin{array}{rcl}  d_{GP}^2(X,\{0\}) &=& \inf \Big\{\epsilon>0\ :\ \nu\otimes\nu\{(x_1,0,x_2,0)\ :\ |d_X(x_1,x_2) -0|\geq\epsilon\}\leq\epsilon\Big\}\\ &=& \inf\Big\{\epsilon>0\ :\ \mu\otimes\mu\{(x_1,x_2)\ :\ d_X(x_1,x_2)\geq\epsilon\}\leq\epsilon\Big\}. \end{array}

I do not have a neat name or a good intuition for this metric yet (although it also looks like it measures “size” or “non-localization” of {X} in some sense). If you have one, let me know!

A while ago (about 2009 I think), when I was a freshly baked Juniorprofessor, I was joking with a colleague that it felt a bit awkward to still be part of studiVZ (a social network for students that was very populuar in Germany at that time – I think actually a Facebook clone). Being a bit smug about our new positions we thought that there should be something like profVZ for guys like us.

Not precisely in the same direction but somehow related: I heard in this post be Peter that the AMS is working on some online community for mathematicians. Go and read the post in which Peter argues why this is a great idea.

By the way: The German pendant of the AMS, the DMV, already has a default social network for all its members (I blogged on this in an earlier post). This is pretty hard to find on the DMV homepage and it seems that almost nobody is using it for anything (except the few people who contribute to the forum). If I see correctly, the DMV social network is using the JomSocial software, a community tool for Joomla! sites. It looks like JomSocial has quite some flexibility and a lot of tools available. However, I have no idea if this could be adapted in a reasonable way to the needs of a mathematical social network (whatever they are, precisely). The AMS has produced great tools already (just think of looking up papers and references in MathSciNet) and I expect that a AMS social network can be great too. However, I would prefer if such an initiative would be run by a worldwide organization, i.e. the IMU, basically just to emphasize that the social network is intended for the whole mathematics community. Anyway: I think that the mathematical community can greatly benefit from a professional social network and that this could be much more useful than the other general ones (just as MathSciNet is in general much more useful for a mathematician than Google Scholar is).

With this post I delve into a topic which is somehow new to me, although I planned to look deeper into this for quite some time already. I stumbled upon the paper Gromov-Wasserstein distances and the metric approach to object matching by Facundo Mémoli which was a pleasure to read and motivated this post.

1. Comparing measures with norms and metrics

There are different notions in mathematics to compare two objects, think of the size of real numbers, the cardinality of sets or the length of the difference of two vectors. Here we will deal with not only comparison of objects but with “measures of similarity”. Two fundamental notions for this are norms in vector spaces and metrics. The norm is the stronger concept in that it uses more structure than a metric and also, every norm induces a metric but not the other way round. There are occasions in which both a norm and a metric are available but lead to different concepts of similarity. One of these instances occurs in sparse recovery, especially in the continuous formulation, e.g. as described in a previous post. Consider the unit interval {I = [0,1]} and two Radon measures {\mu_1} and {\mu_2} on {I} ({I} could also be an aritrary metric space). On the space of Radon measures {\mathfrak{M}(I)} there is the variation norm

\displaystyle \|\mu\|_{\mathfrak{M}}= \sup_\Pi\sum_{A\in\Pi}|\mu(A)|

where the supremum is taken over all partitions {\Pi} of {I} into a finite number of measurable sets. Moreover, there are different metrics one can put on the space of Radon measures, e.g. the Prokhorov metric which is defined for two probability measures (e.g. non-negative ones with unit total mass)

\displaystyle  \begin{array}{rcl}  d_P(\mu_1,\mu_2) & = & \inf\{\epsilon>0\ :\ \mu_1(A)\leq \mu_2(A^\epsilon) + \epsilon,\nonumber\\ & & \qquad \mu_2(A)\leq \mu_1(A^\epsilon) + \epsilon\ \text{for all measurable}\ A\} \end{array}

where {A^\epsilon} denotes the {\epsilon}-neighborhood of {A}. Another familiy of metrics are the Wasserstein metrics: For {p\geq 1} define

\displaystyle d_{W,p}(\mu_1,\mu_2) = \Big(\inf_\nu\int_{I\times I} |x-y|^p d\nu(x,y)\Big)^{1/p} \ \ \ \ \ (1)

where the infimum is taken over all measure couplings of {\mu_1} and {\mu_2}, that is, all measures {\nu} on {I\times I} such that for measurable {A} it holds that

\displaystyle \nu(A\times I) = \mu_1(A)\ \text{and}\ \nu(I\times A) = \mu_2(A).

Example 1 We compare two Dirac measures {\mu_1 = \delta_{x_1}} and {\mu_2 = \delta_{x_2}} located at distinct points {x_1\neq x_2} in {I} as seen here:

058_two_diracsThe variation norm measures their distance as

\displaystyle \|\mu_1-\mu_2\|_{\mathfrak{M}} = \sup_\Pi\sum_{A\in\Pi}|\delta_{x_1}(A) - \delta_{x_2}(A)| = 2

(choose {\Pi} such that it contains {A_1} and {A_2} small enough that {x_1\in A_1}, {x_2\in A_2} but {x_1\notin A_2} and {x_2\notin A_1}). The calculate the Prokhorov metric note that you only need to consider {A}‘s which contain only one of the points {x_{1/2}} and hence, it evaluates to

\displaystyle d_P(\mu_1,\mu_2) = |x_1-x_2|.

For the Wasserstein metric we observe that there is only one possible measure coupling of {\delta_{x_1}} and {\delta_{x_2}}, namely the measure {\nu = \delta_{(x_1,x_2)}}. Hence, we have

\displaystyle d_{W,p}(\mu_1,\mu_2) = \Big(\int_{I\times I}|x-y|^pd\delta_{(x_1,x_2)}(x,y)\Big)^{1/p} = |x_1-x_2|.

The variation norm distinguishes the two Diracs but is not able to grasp the distance of their supports. On the other hand, both metrics return the geometric distance of the supports in the underlying space {I} as distance of the Diracs. Put in pictures: The variation norm of the difference measures the size ob this object

058_two_diracs_difference

while both metrics capture the distance of the measures like here

Two diracs

It should not stay unnoted that convergence in both the Prokhorov metric and the Wasserstein metrics is exactly the weak convergence of probability measures.

The above example provides a motivation to study metric structures on spaces, even if they are also equipped with a norm. Another reason to shift attention from normed spaces to metric spaces is the fact that there has emerged a body of work to build a theory of analysis in metric spaces (see, e.g. this answer on mathoverflow or the book Gradient Flows: In Metric Spaces And In The Space Of Probability Measures by Ambrosio, Gigli and Savaré (which puts special emphasis on the space of probability measures)). Yet another motivation for the study of metrics in this way is the problem of comparing shapes (without being precisely defined yet): Which of these shapes look most alike?

058_shapes

(Note that shapes need not to be two dimensional figures, you may also think of more complex objects like surfaces in three dimensions or Riemannian manifolds.)

One may also ask the question how two compare different images defined on different shapes, i.e. different “distributions of colour” on two different shapes.

2. Comparing shapes: Metric spaces

Up to now we tried to compare different measures, defined on the same set. At least to me it seems that both the Prokhorov and the Wasserstein metrics are suited to measure the similarity of measures and in fact, they do so somehow finer than the usual norm does.

Let’s try to go one step further and ask ourselves, how we could compare two measures {\mu_1} and {\mu_2} which are defined on two different sets? While thinking about an answer one need to balance several things:

  • The setup should be general enough to allow for the comparison of a wide range of objects.
  • It should include enough structure to allow meaningful statements.
  • It should lead to a measure which is easy enough to handle both analytically and computationally.

For the first and second bullet: We are going to work with measures not on arbitrary sets but on metric spaces. This will allow to measure distances between points in the sets and, as you probably know, does not pose a severe restriction. Although metric spaces are much more specific than topological spaces, we still aim at quantitative measures which are not provided by topologies. With respect to the last bullet: Note that both the Prokhorov and the Wasserstein metric are defined as infimums over fairly large and not too well structured sets (for the Prokhorov metric and need to consider all measurable sets and their {\epsilon}-neighborhoods, for the Wasserstein metric, one need to consider all measure couplings). While they can be handled quite well theoretically, their computational realization can be cumbersome.

In a similar spirit than Facundo Memoli’s paper we work our way up from comparing subsets of metric spaces up to comparing two different metric spaces with two measures defined on them.

2.1. Comparing compact subsets of a metric space: Hausdorff

Let {(X,d)} be a compact metric space. Almost hundred years ago Hausdorff introduced a metric on the family of all non-empty compact subsets of a metric space as follows: The Hausdorff metric of two compact subsets {A} and {B} of {X} is defined as

\displaystyle d_H(A,B) = \inf\{\epsilon>0 \ :\ A\subset B_\epsilon,\ B \subset A_\epsilon\}

(again, using the notion of {\epsilon}-neighborhood). This definition seems to be much in the spirit of the Prokhorov metric.

Proposition 2.1 in Facundo Memolis paper shows that the Hausdorff metric has an equivalent description as

\displaystyle d_H(A,B) = \inf_R \sup_{(a,b) \in R} d(a,b)

where the infimum is taken over all correspondences {R} of {A} and {B}, i.e., all subset {R\subset A\times B} such that for all {a\in A} there is {b\in B} such that {(a,b) \in R} and for all {b\in B} there {a\in A} such that {(a,b)\in R}. One may also say set coupling of {A} and {B} instead of correspondence.

Example 2 There is always the full coupling {R = A\times B}. Three different set couplings of two subsets {A} and {B} of the unit interval are shown here:

058_set_coupling

the “full one” {A\times B} in green and two “slim” ones in red and orange. Other “slim” couplings can be obtained from surjective mappings {f:A\rightarrow B} by {R = \{(a,f(a))\ :\ a\in A\}} (or with the roles of {A} and {B} swapped): If you couple a set {A} with itself, there is also the trivial coupling

\displaystyle R = \{(a,a)\ : \ a\in A\}

which is just the diagonal of {A\times A}

Note that the alternative definition of the Hausdorff metric is more in the spirit of the Wasserstein metric: It does not use enlarged objects (by {\epsilon}-neighborhoods) but couplings.

The Hausdorff metric is indeed a metric on the set {\mathfrak{C}(X)} of all non-empty compact subsets of a metric space {X} and if {X} itself is compact it even holds that {(\mathfrak{C}(X),d_H)} is a compact metric space (a result, known as Blaschke Selection Theorem).

One may say that we went up an abstraction ladder one step by moving from {(X,d)} to {(\mathfrak{C}(X),d_H)}.

2.2. Comparing compact metric spaces: Gromov-Hausdorff

In the previous subsection we worked within one metric space {X}. In the book “Metric Structures for Riemannian and Non-Riemannian Spaces” Misha Gromov introduced a notion to compare two different metric spaces. For compact metric space {X} and {Y} the Gromov-Hausdorff metric is defined as

\displaystyle d_{GH}(X,Y) = \inf_{Z,f,g} d_H(f(X),g(Y)) \ \ \ \ \ (2)

where the infimum is taken over

  • all metric spaces {Z} and
  • all isometric embeddings {f} and {g} which embed {X} and {Y} into {Z} respectively.

In words: To compute the Gromov-Hausdorff metric, you try embed both {X} and {Y} into a common larger space isometrically such that they are as close as possible according to the Hausdorff metric in that space.

Strictly speaking, the above definition is not well stated as one can not form an infimum over all metric spaces since this collection does not form a set according to the rules of set theory. More precisely one should write that {d_{GH}(X,Y)} is the infimum over all {r>0} such that there exists a metric space {Z} and isometric embeddings {f} and {g} of {X} and {Y}, respectively, such that {d_H(f(X),g(Y))<r}.

As the Hausdorff metric could be reformulated with set couplings there is a reformulation of the Gromov-Hausdorff metric based on metric couplings: A metric coupling of two metric spaces {(X,d_X)} and {(Y,d_Y)} is a metric {d} on the disjoint union {X\sqcup Y} of {X} and {Y} such that for all {x,x'\in X} and {y,y'\in Y} it holds that {d(x,x') = d_X(x,x')} and {d(y,y') = d_Y(y,y')}.

Example 3 We couple a metric space {(X,d)} with itself. We denote with {(X',d')} an identical copy of {(X,d)} and look for a metric {D} on {X\times X'} that respects the metrics {d} and {d'} in the way a metric coupling has to.

To distinguish elements from {X} and {X'} we put a {'} on all quantities from {X'}. Moreover, for {x\in X} we denote by {x'} its identical copy in {X'} (and similarly for {x'\in X'}, {x} is its identical twin). Then, for any {\epsilon>0} we can define {D_\epsilon(x,x') = D_\epsilon(x',x) = \epsilon} (i.e. the distance between any two identical twins is {\epsilon}. By the triangle inequality we get for {x\in X} and {y'\in X'} that {D_\epsilon(x,y')} should fulfill

\displaystyle D_\epsilon(x',y') - D_\epsilon(x',x) \leq D_\epsilon(x,y') \leq D_\epsilon(x,y) + D_\epsilon(y,y')

and hence

\displaystyle d(x,y) - \epsilon \leq D_\epsilon(x,y') \leq d(x,y) + \epsilon.

Indeed we can choose {D_\epsilon(x,y') = d(x,y)} if {x\in X} and {y'\in Y} leading to one specific metric coupling for any {\epsilon}. This couplings allow to distinguish identical twins and behave as a metric on the whole disjoint union. In the limiting case {\epsilon\rightarrow 0} we do not obtain a metric but a semi-metric or pseudo-metric which is just the same as a metric but without the assumption that {d(x,y) = 0} implies that {x=y}.

Example 4 The above example of a metric coupling of a metric space with itself was somehow “reproducing” the given metric as accurate as possible. There are also other couplings that put very different distances to points {D(x,y')} and there is also a way to visualize metric couplings: When building the disjoint union of two metric spaces {X} and {Y}, you can imagine this as isometrically embedding both in a larger metric space {Z} in a non-overlapping way and obtain the metric coupling {D} as the restriction of the metric on {Z} to {X\sqcup Y}. For {X=Y=[0,1]} you can embed both into {Z = {\mathbb R}^2}. A metric coupling which is similar (but not equal) to the coupling of the previous example is obtained by putting {X} and {Y} side by side at distance {\epsilon} as here (one space in green, the other in blue).

058_metric_coupling_embedding1

A quite different coupling is obtained by putting {X} and {Y} side by side, but in a reversed way as here:

058_metric_coupling_embedding2

You may even embed them in a more weired way as here:

058_metric_coupling_embedding3

but remember that the embeddings has to be isometric, hence, distortions like here are not allowed.

058_metric_coupling_embedding4This example illustrate that the idea of metric coupling is in similar spirit as of “embedding two spaces in a common larger one”.

With the notion of metric coupling, the Gromov-Hausdorff metric can be written as

\displaystyle d_{GH}(X,Y) = \inf_{R,d} \sup_{(x,y)\in R} d(x,y) \ \ \ \ \ (3)

where the infimum is taken over all set couplings {R} of {X} and {Y} and all metric couplings {d} of {(X,d_X)} and {(Y,d_Y)}.

In words: To compute the Gromov-Hausdorff metric this way, you look for a set coupling of the base sets {X} and {Y} and a metric coupling {d} of the metrics {d_X} and {d_Y} such that the maximal distance of two coupled points {x} and {y} is as small as possible. While this may look more complicated than the original definition from~(2), note that the original definition uses all metric spaces {Z} in which you can embed {X} and {Y} isometrically, which seems barely impossible to realize. Granted, the new definition also considers a lot of quantities.

Also note that this definition is in spirit of the Wasserstein metric from~(1): If there were natural measures {\mu_R} on the set couplings {R} we could write \begin{equation*} d_{GH}(X,Y) = \inf_{R,d} \Big(\int d(x,y)^pd\mu_R\Big)^{1/p} \end{equation*} and in the limit {p\rightarrow\infty} we would recover definition~(3).

Example 5 The Gromov-Hausdorff distance of a metric space {(X,d_X)} to itself is easily seen to be zero: Consider the trivial coupling {R = \{(x,x)\ :\ x\in X\}} from Example~2 and the family {D_\epsilon} of metric couplings from Example~3. Then we have {d_{GH}(X,X) \leq \epsilon} for any {\epsilon >0} showing {d_{GH}(X,X) = 0}. Let’s take one of the next-complicated examples and compute the distance of {X = [0,1]} and {Y=[0,2]}, both equipped with the euclidean metric. We couple the sets {X} and {Y} by {R = \{(x,2x)\ : \ x\in X\}} and the respective metrics by embedding {X} and {Y} into {{\mathbb R}^2} as follows: Put {Y} at the line from {(0,0)} to {(2,0)} and {X} at the line from {(\tfrac12,\epsilon)} to {(1\tfrac12,\epsilon)}:

058_gromov_hausdorff_12

This shows that {d_{GH}(X,Y) \leq \tfrac12} and actually, we have equality here.

There is another reformulation of the Gromov-Hausdorff metric, the equivalence of which is shown in Theorem 7.3.25 in the book “A Course in Metric Geometry” by Dmitri Burago, Yuri Burago and Sergei Ivanov:

\displaystyle d_{GH}(X,Y) = \tfrac12\inf_R \sup_{\overset{\overset{x_{1/2}\in X}{y_{1/2}\in Y}}{(x_i,y_i)\in R}}\big| d_X(x_1,x_2) - d_Y(y_1,y_2)\big| \ \ \ \ \ (4)

where the infimum is taken over all set couplings {R} of {X} and {Y}.

In words: Look for a set coupling such that any two coupled pairs {(x_1,y_1)} and {(x_2,y_2)} have the “most equal” distance.

This reformulation may have the advantage over the form (3) in that is only considers the set couplings and the given metrics {d_X} and {d_Y} and no metric coupling is needed.

Note that, as the previous reformulation~(3), it is also in the spirit of the Wasserstein metric: If there were natural measures {\mu_R} in the set couplings {R}, we could write

\displaystyle d_{GH}(X,Y) = \tfrac12\inf_R \Big(\int_{R\times R}\big| d_X(x_1,x_2) - d_Y(y_1,y_2)\big|^p d\mu_R(x_1,y_1)d\mu_R(x_2,y_2)\Big)^{1/p}.

and recover the formulation~(4) in the limit {p\rightarrow\infty}.

One may say that we went up an abstraction ladder one step further by moving from {(X,d)} to {(\mathfrak{C}(X),d_H)} to {(\text{All compact metric spaces},d_{GH})}.

Since this post has been grown pretty long already, I decided to do the next step (which is the already announced metric on metric spaces which additionally carry some measure on them – so-called metric measure spaces) in a later post.

While this blog is intended to be totally job-related, this post will be unusual in that it will be a little bit personal. And indeed, this post is about the mix of work and non-work life.

A little more than half a year ago I started my life as a part-time mathematician. To be more precise, I am in Elternzeit, which is the German framework for parental leave. According to German law, parents have several rights related to their work when they get kids. One of them is to reduce their weekly hours of work when they have small children and this is what I did: For one year I reduced my weekly hours of work to 50%. The reason for the reduction is that my wife has a very busy year of work (for those who know: she is in her “Referendariat” to become a teacher – but I am not going to rant about the German Referendariat for teachers here…). There is also Elterngeld is Germany: That is, that you get some fraction of your salary if you stay at home to be with your kids, but this only applies in the first 14 months but my kids are older and hence, I just get about half the salary during the Elternzeit. But one of the first things I learned was that this is not totally true. In fact, my “parental-leave”-salary is higher than half of my usual salary. This has two reasons: First, there is “income-tax-progression” here, that is, for more salary, you pay a higher tax rate. Second, some parts of my salary are bonuses which are not halved.

Besides the financial issues there are other things about parental leave as a mathematician at a university which I find interesting and worth blogging about. I divided these issues into two parts: work-related and family related.

1. Work life

In this year of parental leave I am in my office for about four hours a day. I go there, when the kids are in school and at day-care and I leave early enough to prepare lunch for our schoolgirl. Four hours a day is not much and indeed, I get much less things done, than I used to do. But that was totally expected. To counteract a bit, I do a little less reviewing papers. Moreover, my teaching duties are halved. And also, blogging is a little bit reduced. But anyway, I do not make it to get all the things done, that I would like to. However, this is not much different from how it used to be. And with even less time, I get better in prioritizing.

But more important than the mere reduction of time for work is the fact that I am not able to be in the office in the afternoon, and I am not able to make appointments or to attend meetings in the afternoon. This does influence my daily work to some extend. While all my colleagues know that I am on part-time (and, as far as I experience, have no objections to that), they usually do not have it in mind when they make dates. If I am lucky, dates are found with doodle, and then it’s easy for me, but I have to remind people that I do not have time in the afternoon frequently. Moreover, some regular meetings are scheduled for the afternoon by default and I just miss them. Finally, there are exceptional dates (e.g. the ones related with job interviews and negotiations on which I have written here and here). In these cases, I also mentioned my limited possibilities for meetings and the severe restrictions. For the job interviews I even asked for a video-interview, but that did not work out. On one occasion, they tried to schedule my job interview for the weekend (which would be easier to arrange for me) but some members of the committee objected due to family reasons – which is totally OK in my opinion. For the negotiations, I tried to have them during school holidays but this did not work out either. Hence, we had to arrange additional child care (our children have four grand parents and always some of them could step in – what I didn’t try was, to get reimbursed for the train tickets for the grandmas; I think this would be reasonable but would cause serious irritation in the administration).

Another severe change in my work-life is that I can not attend conferences and meetings with the only exception when they are during school holidays. Here we arranged holidays of the kids at their grandparents and this way I could attend some nice meetings this year.

2. Family life

In my parental leave my wife and I changed roles. It is not that I did not care about family life, household and such when I was working full-time but still, I was working full-time and my wife was working half-time or less. Since she has to work a LOT, I am in charge of everything here at home now. But, as before, it is not that she is not doing anything in that respect, but the important thing is that I am in charge. And that makes more of a difference than I anticipated. The pure amount of work doing shopping, cooking, cleaning and so on is OK because I have the time to do so (although – it is work nonetheless). But to be in charge is a bit more. Especially, if there is something to be done, I can not say “Let’s wait for one day, maybe it will be done by the time…”. If I will not do it, nobody will (again: Not totally true, but this is the way of thinking which I employ). Moreover, there are some things which come with being “house husband” which did not come to my mind when I planned my part time year: organizing children’s birthday parties, arranging new afternoon activities and weekend activities or trips, taking care of seasonal decoration, thinking of Christmas presents early enough or arranging routine medical check-ups on time are just a few of them. All this is also not hard, but harder than I thought, when you are in charge.

In total, all this role change and house-keeping is a very refreshing experience. Being a house husband is not something be scared of. It is rewarding, it’s mostly fun, it’s a lot of work and it is quite some responsibility.

As a final remark, I think, that this year of my parental leave does also mean something for our children. Seeing that dad is at home and being the one which is always there, and that both parents can do this job and that, the other way round, mom is at work most of the day, comes home late and needs rest and such, is an experience for them which shows them that there are different ways to organize a family. It probably also influences them in the way they see gender roles.

In conclusion: If you think about staying at home for kids and the laws of your country give you the opportunity to do so, then I can recommend it. Even being fully in charge and with a full-time working wife – probably even more so.

There are several answers to the following question:

1. What is a convex set?

For a convex set you probably know these definitions:

Definition 1 A subset {C} of a real vector space {V} is convex if for any {x,y\in C} and {\lambda\in[0,1]} it holds that {\lambda x + (1-\lambda)y\in C}.

In other words: If two points lie in the set, then every convex combination also lies in the set.

While this is a “definition from the inside”, convex sets can also be characterized “from the outside”. We add closedness as an assumption and get:

Definition 2 A closed subset {C} of a real locally convex topological vector space {V} is convex if it is the intersection of closed halfspaces (i.e. sets of the form {\{x\in V\ :\ \langle a,x\rangle\geq c\}} for some {a} in the dual space {V^*} and {c\in {\mathbb R}}).

Moreover, we could define convex sets via convex functions:

Definition 3 A set {C\subset V} is convex if there is a convex function {f:V\rightarrow {\mathbb R}} such that {C = \{x\ :\ f(x)\leq 0\}}.

Of course, this only makes sense once we have defined convex functions. Hence, we could also ask the question:

2. What is a convex function?

We can define a convex function by means of convex sets as follows:

Definition 4 A function {f:V\rightarrow{\mathbb R}} from a real vector space into the real numbers is convex, if its epigraph {\text{epi} f = \{(x,\mu)\ :\ f(x)\leq \mu\}\subset V\times {\mathbb R}} is convex (as a subset of the vector space {V\times{\mathbb R}}).

The epigraph consists of the points {(x,\mu)} which lie above the graph of the function and carries the same information as the function.

(Let me note that one can replace the real numbers here and in the following with the extended real numbers {\bar {\mathbb R} = {\mathbb R}\cup\{-\infty,\infty\}} if one uses the right extension of the arithmetic and the obvious ordering, but we do not consider this in this post.)

Because epigraphs are not arbitrary convex sets but have a special form (if a point {(x,\mu)} is in an epigraph, then every {(x,\lambda)} with {\lambda\geq \mu} is also in the epigraph), and because the underlying vector space {V\times {\mathbb R}} comes with an order in the second component, some of the definitions for convex sets from above have a specialized form:

From the definition “convex combinations stay in the set” we get:

Definition 5 A function {f:V\rightarrow {\mathbb R}} is convex, if for all {x,y\in V} and {\lambda\in [0,1]} it holds that

\displaystyle f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda)f(y).

In other words: The secant of any two points on the graph lies above the graph.

From the definition by “intersection of half spaces” we get another definition. Since we added closedness in this case we add this assumption also here. However, closedness of the epigraph is equivalent to lower-semicontinuity (lsc) of the function and since lsc functions are very convenient we use this notion:

Definition 6 A function {f:V\rightarrow{\mathbb R}} is convex and lsc if it is the pointwise supremum of affine functions, i.e., for some set {S} of tuples {(a,c)\in V^*\times {\mathbb R}} it holds that

\displaystyle f(x) = \sup_{(a,c) \in S} \langle a,x\rangle + c.

A special consequence if this definition is that tangents to the graph of a convex function lie below the function. Another important consequence of this fact is that the local behavior of the function, i.e. its tangent plane at some point, carries some information about the global behavior. Especially, the property that the function lies above its tangent planes allows one to conclude that local minima of the function are also global. Probably the last properties are the ones which give convex functions a distinguished role, especially in the field of optimization.

Some of the previous definitions allow for generalizations into several direction and the quest for the abstract core of the notion of convexity has lead to the field of abstract convexity.

3. Abstract convexity

When searching for abstractions of the notion of convexity one may get confused by the various different approaches. For example there are generalized notions of convexity, e.g. for function of spaces of matrices (e.g. rank-one convexity, quasi-convexity or polyconvexity) and there are also further different notions like pseudo-convexity, invexity or another form ofquasi-convexity. Here we do not have generalization in mind but abstraction. Although both things are somehow related, the way of thinking is a bit different. Our aim is not to find a useful notion which is more general than the notion of convexity but to find a formulation which contains the notion of convexity and abstracts away some ingredients which probably not carry the essence of the notion.

In the literature one also finds several approaches in this direction and I mention only my favorite one (and I restrict myself to abstractly convex functions and not write about abstractly convex sets).

To me, the most appealing notion of an abstract convex function is an abstraction of the definition as “pointwise supremum of affine functions”. Let’s look again at the definition:

A function {f:V\rightarrow {\mathbb R}} is convex and lsc if there is a subset {S} of {V^*\times {\mathbb R}} such that

\displaystyle f(x) = \sup_{(a,c)\in S} \langle a,x\rangle + c.

We abstract away the vector space structure and hence, also the duality, but keep the real-valuedness (together with its order) and define:

Definition 7 Let {X} be a set and let {W} be a set of real valued function on {X}. Then a function {f:X\rightarrow{\mathbb R}} is said to be {W}-convex if there is a subset {S} of {W\times {\mathbb R}} such that

\displaystyle f(x) = \sup_{(w,c)\in S} w(x) + c.

What we did in this definition was simply to replace continuous affine functions {x\mapsto \langle a,x\rangle} on a vector space by an arbitrary collection of real valued functions {x\mapsto w(x)} on a set. On sees immediately that for every function {w\in W} and any real number {c} the function {w+c} is {W} convex (similarly to the fact that every affine linear function is convex).

Another nice thing about this approach is, that it allows for some notion of duality/conjugation. For {f:V\rightarrow{\mathbb R}} we define the {W}-conjugate by

\displaystyle f^{W*}(w) = \sup_{w\in W} \Big(w(x)- f(x) \Big)

and we can even formulate a biconjugate

\displaystyle f^{W**}(x) = \sup_x \Big(w(x) - f^{W*}(w)\Big).

We naturally have a Fenchel inequality

\displaystyle w(x) \leq f(x) + f^{W*}(w)

and we may even define subgradients as usual. Note that a conventional subgradient is an element of the dual space which defines a tangent plane at the point where the subgradient is taken, that is, {a\in V^*} is a subgradient of {f} at {x}, if for all {y\in V} it holds that {f(y) \geq f(x) + \langle a,y-x\rangle} or

\displaystyle f(y) - \langle a,y\rangle\geq f(x) - \langle a,x\rangle.

A {W}-subgradient is an element of {W}, namely we define: {w\in\partial^{W}f(x)} if

\displaystyle \text{for all}\ y:\quad f(y) -w(y) \geq f(x) - w(x).

Then we also have a Fenchel equality:

\displaystyle w\in\partial^W f(x)\iff w(x) = f(x) + f^{W*}(w).

One may also take dualization as the starting point for an abstraction.

4. Abstract conjugation

We could formulate the {W}-conjugate as follows: For {\Phi(w,x) = w(x)} we have

\displaystyle f^{W*}(x) = \sup_w \Big(\Phi(w,x) - f(x)\Big).

This opens the door to another abstraction: For some sets {X,W} (without any additional structure) define a coupling function {\Phi: X\times W \rightarrow {\mathbb R}} and define the {\Phi}-conjugate as

\displaystyle f^{\Phi*}(w) = \sup_x \Big(\Phi(w,x) - f(x)\Big)

and the {\Phi}-biconjugate as

\displaystyle f^{\Phi**}(x) = \sup_x \Big(\Phi(w,x) - f^{W*}(w)\Big)

Let {\Omega} be a compact subset of {{\mathbb R}^d} and consider the space {C(\Omega)} of continuous functions {f:\Omega\rightarrow {\mathbb R}} with the usual supremum norm. The Riesz Representation Theorem states that the dual space of {C(\Omega)} is in this case the set of all Radon measures, denoted by {\mathfrak{M}(\Omega)} and the canonical duality pairing is given by

\displaystyle  \langle\mu,f\rangle = \mu(f) = \int_\Omega fd\mu.

We can equip {\mathfrak{M}(\Omega)} with the usual notion of weak* convergence which read as

\displaystyle  \mu_n\rightharpoonup^* \mu\ \iff\ \text{for every}\ f:\ \mu_n(f)\rightarrow\mu(f).

We call a measure {\mu} positive if {f\geq 0} implies that {\mu(f)\geq 0}. If a positive measure satisfies {\mu(1)=1} (i.e. it integrates the constant function with unit value to one), we call it a probability measure and we denote with {\Delta\subset \mathfrak{M}(\Omega)} the set of all probability measures.

Example 1 Every non-negative integrable function {\phi:\Omega\rightarrow{\mathbb R}} with {\int_\Omega \phi(x)dx} induces a probability measure via

\displaystyle  f\mapsto \int_\Omega f(x)\phi(x)dx.

Quite different probability measures are the {\delta}-measures: For every {x\in\Omega} there is the {\delta}-measure at this point, defined by

\displaystyle  \delta_x(f) = f(x).

In some sense, the set {\Delta} of probability measure is the generalization of the standard simplex in {{\mathbb R}^n} to infinite dimensions (in fact uncountably many dimensions): The {\delta}-measures are the extreme points of {\Delta} and since the set {\Delta} is compact in the weak* topology, the Krein-Milman Theorem states that {\Delta} is the weak*-closure of the set of convex combinations of the {\delta}-measures – similarly as the standard simplex in {{\mathbb R}^n} is the convex combination of the canonical basis vectors of {{\mathbb R}^n}.

Remark 1 If we drop the positivity assumption and form the set

\displaystyle  O = \{\mu\in\mathfrak{M}(\Omega)\ :\ |f|\leq 1\implies |\mu(f)|\leq 1\}

we have the {O} is the set of convex combinations of the measures {\pm\delta_x} ({x\in\Omega}). Hence, {O} resembles the hyper-octahedron (aka cross polytope or {\ell^1}-ball).

I’ve taken the above (with almost similar notation) from the book “ A Course in Convexity” by Alexander Barvinok. I was curious to find (in Chapter III, Section 9) something which reads as a nice glimpse on semi-continuous compressed sensing: Proposition 9.4 reads as follows

Proposition 1 Let {g,f_1,\dots,f_m\in C(\Omega)}, {b\in{\mathbb R}^m} and suppose that the subset {B} of {\Delta} consisting of the probability measures {\mu} such that for {i=1,\dots,m}

\displaystyle  \int f_id\mu = b_i

is not empty. Then there exists {\mu^+,\mu^-\in B} such that

  1. {\mu^+} and {\mu^-} are convex combinations of at most {m+1} {\delta}-measures, and
  2. it holds that for all {\mu\in B} we have

    \displaystyle  \mu^-(g)\leq \mu(g)\leq \mu^+(g).

In terms of compressed sensing this says: Among all probability measures which comply with the data {b} measured by {m} linear measurements, there are two extremal ones which consists of {m+1} {\delta}-measures.

Note that something similar to “support-pursuit” does not work here: The minimization problem {\min_{\mu\in B, \mu(f_i)=b_i}\|\mu\|_{\mathfrak{M}}} does not make much sense, since {\|\mu\|_{\mathfrak{M}}=1} for all {\mu\in B}.

In my previous post “Yes we can change. Why the DMV should change its name” I tried to make some advertisement for the petition that the “Deutsche Mathematiker-Vereinigung” should change its name to “Deutsche Mathematische Vereinigung”.

Yesterday I received the new issue of the Mitteilungen der DMV and there is the result:

  • Votes cast:

1007

  • Valid votes:

1000

  • Invalid votes:

7

  • Votes for the name change:

463

  • Votes against the name change:

530

  • Abstentions:

7

So, 53% votes against the change and hence, the DMV will stick to the meaning of “Union of Mathematicians” rather than “Union for Mathematics”.

On the one hand, this sound like a fairly close-run. But one the other hand I was surprised that so many people actually actively voted against the change. This seems to show that there is really a non negligible fraction of people who really have something against the proposed name and is not only  in favor of keeping things as they are and, in consequence, make the effort to submit a vote against. Moreover, Martin Skutella writes in his editorial of the current issue of the Mitteilungen (in my own translation):

Reassuring, that the DMV braves, firm as a rock, the short-lived zeitgeist!

Hmm, I am not sure what he is intended to say (keeping in mind that his editorials are usually somehow satirically). In case there is no irony involved: Is he really trying to say that the change the DMV made in the last decade from a society which serves the mathematicians only (which basically meant professors at German universities) to an organization which sees its central mission to promote mathematics as a whole and on a broad scale ranging from pupils over student, researchers to business people and companies is just short-lived zeitgeist and in a few years we are back to the times in which the members of the DMV were only research mathematicians (at least professors) and when the Mitteilungen der DMV only contained latest news on the daily grind of a math professor at a university?

I am not sure, but I hope the name “Deutsche Mathematiker Vereinigung” will not hold back any of the “new” target audience (like pupils, students, teachers, math people in companies,…) to join the DMV.

Next Page »

Follow

Get every new post delivered to your Inbox.

Join 30 other followers