March 2013

If you are working on optimization with partial differential equations as constraints, you may be interested in the website

“OPTPDE – A Collection of Problems in PDE-Constrained Optimization”,

If you have developed an algorithm which can handle a certain class of optimization problems you need to do evaluations and tests on how well the method performs. To do so, you need well constructed test problems. This could be either problems where the optimal solution is known analytically our problems where the solution is known with a rigorous error bound obtained with a bullet-proof solver. Both things are not always easy to obtain and OPTPDE shall serve as a resource for such problems. It has been designed by Roland Herzog, Arnd Rösch, Stefan Ulbrich and Winnifried Wollner.

The generation of test instance for optimization problems seems quite important to me and indeed, several things can go wrong if this is not done right. Frequently, one sees tests for optimization routines on problems where the optimal solution is not known. Since there are usually different ways to express optimality conditions it is not always clear how to check for optimality; even more so, if you only check for “approximate optimality”, e.g. up to machine precision. A frequently observed effect is a kind of “trusted method bias”. By this I mean that an optimal solution is calculated by some trusted method and comparing the outcome of the tested routine with this solution. However, the trusted method uses some stopping criterion usually based on some specific set of formulations of optimality conditions and these can be different from what the new method has been tuned to. And most often, the stopping criteria do not give a rigorous error bound for the solution or the optimal objective value.

For sparse reconstruction problems, I dealt with this issue in “Constructing test instance for Basis Pursuit Denoising” (preprint available here) but I think this methodology could be used for other settings as well.

Here I continue my previous post on methods to compare shapes. In that post we started with different metrics between two probability measures {\mu_1} and {\mu_2} defined on the same set {X}, namely the Prokhorov metric and the Wasserstein metric. Then we also had a look on the Hausdorff metric which measures the distance between two compact subsets {A} and {B} of a metric space {(X,d)} and finally considered the Gromov-Hausdorff metric between two metric spaces {(X,d_X)} and {(Y,d_Y)}. Our tools have been

  • set couplings of {X} and {Y}, i.e. subsets {R\subset X\times Y} such that for all {x\in X} there {y\in Y} such that {(x,y)\in R} and for all {y\in Y} there is {x\in X} such that {(x,y)\in R},
  • metric couplings of {d_X} and {d_Y}, i.e. metrics {d} on the disjoint union of {X} and {Y} such that {d(x,x') = d_X(x,x')} if {x} and {x'} are in {X} and {d(y,y') = d_Y(y,y')} if {y} and {y'} are in {Y}. In fact, one could also work with semi-metrics {d}, i.e. they do not need to be positive definite, and
  • measure couplings of {\mu_X} and {\mu_Y}, i.e. measures {\nu} on {X\times Y} such that {\nu(A\times Y) = \mu_X(A)} and {\nu(X\times B) = \mu_Y(B)} for all {\mu_X}-/{\mu_Y}-measurable set {A} and {B}, respectively.

Now we make the next (and final) step and compare metric spaces {(X,d_X)} and {(Y,d_Y)} which are both equipped with a measure. These objects are known as metric measure spaces or mm-spaces and are formally defined as follows:

Definition 1 (mm-space) A metric measure space is a tripel {(X,d_X,\mu_X)} consisting a compact metric space {(X,d_X)} and a Borel probability measure {\mu_X} on {X}.

Note that sometimes it is included that {\mu_X} has full support (i.e., equal to {X}) in the definition of an mm-space, but it seems that not everybody does it like that.

1. Comparing mm-spaces: Gromov-Wasserstein

Our question is: How to we compare two mm-spaces {(X,d_X,\mu_X)} and {(Y,d_Y,\mu_Y)}? The plan is simply to augment the previous versions of the Gromov-Hausdorff distances defined in my previos post here and here by something which takes the measures on the respective metric spaces into account. We recall both formulations of the Gromov-Hausdorff distance: The first is

\displaystyle   d_{GH}(X,Y) = \inf_{R,d} \sup_{(x,y)\in R} d(x,y) \ \ \ \ \ (1)

where the infimum is taken over all set couplings {R} of {X} and {Y} and metric couplings {d} of {d_X} and {d_Y}, and the second is

\displaystyle   d_{GH}(X,Y) = \tfrac12\inf_R \sup_{\overset{\overset{x_{1/2}\in X}{y_{1/2}\in Y}}{(x_i,y_i)\in R}}\big| d_X(x_1,x_2) - d_Y(y_1,y_2)\big| \ \ \ \ \ (2)

where the infimum is also taken over all set couplings {R} of {X} and {Y}.

Basically, we have already seen what we should do to build a metric between mm-spaces. The idea is: if there were some “natural measure” {\mu_R} for any set coupling {R} of {X} and {Y}, then we would simply define the distance between {(X,d_X,\mu_X)} and {(Y,d_Y,\mu_Y)} as

\displaystyle  d_{GH}(X,Y) = \inf_{R,d} \Big(\int d(x,y)^pd\mu_R\Big)^{1/p}

(as a generalization of (1)) and

\displaystyle  d_{GH}(X,Y) = \tfrac12\inf_R \Big(\int_{R\times R}\big| d_X(x_1,x_2) - d_Y(y_1,y_2)\big|^p d\mu_R(x_1,y_1)d\mu_R(x_2,y_2)\Big)^{1/p}

(as a generalization of (2)). In both cases we can have {1\leq p<\infty}. Note that the obvious modification for {p=\infty} leads to something very similar to the Gromov-Hausdorff metrics.

But indeed there are “natural measures” on the set couplings of {X} and {Y}: At least for the full coupling {R= X\times Y} there are the measure couplings {\nu} of {\mu_X} and {\mu_Y}! (One does not need to consider smaller set couplings {R\subsetneq X\times Y} since this can be taken into account by the measure couplings; they do not need to have full support anyway.)

Applying this idea to the version (1) of the Gromov-Hausdorff metric we arrive at the following expression, which can be called Gromov-Wasserstein metric,

\displaystyle   d_{Gw}^1(X,Y) = \inf_{\nu,d} \Big(\int_{X\times Y} d(x,y)^pd\nu(x,y)\Big)^{1/p} \ \ \ \ \ (3)

where the infimum is taken over all measure couplings {\nu} of {\mu_X} and {\mu_Y} and all metric couplings {d} of {d_X} and {d_Y}.

Starting from the version (2) of the Gromov-Hausdorff metric we arrive at another formulation:

\displaystyle   d_{GW}^2(X,Y) = \tfrac12\inf_\nu \Big(\int_{X\times Y}\int_{X\times Y}\big| d_X(x_1,x_2) - d_Y(y_1,y_2)\big|^p d\nu(x_1,y_1)d\nu(x_2,y_2)\Big)^{1/p} \ \ \ \ \ (4)

where the infimum is taken over all measure couplings {\nu} of {\mu_X} and {\mu_Y}.

While both versions of the Gromov-Hausdorff metric for compact metric spaces where equal, the same is not true for both generalizations to mm-spaces: In his paper Memoli proves that {d_{GW}^2\leq d_{GW}^1} and gives an example (right after Remark 5.14) where strict inequality holds.

2. Comparing mm-spaces: Gromov-Prokhorov

Instead of starting from the Gromov-Hausdorff distance between metric spaces and augmenting their definition with something that takes the measures into account, we could also start from the “>Prokhorov metric between probability measures and augment the definition with something that takes the metric into account. In fact there are also two possibilities to do so: In the appendix of his paper, Memoli quotes this version (from this paper by Greven, Pfaffelhuber and Winter) of a metric between mm-spaces which we call Gromov-Prokhorov metric

\displaystyle   d_{GP}^1(X,Y) = \inf_{\nu,d}\Big\{\epsilon>0\ :\ \nu\{(x,y)\ :\ d(x,y)\geq\epsilon\}\leq\epsilon\Big\} \ \ \ \ \ (5)

where the infimum is taken over all measure couplings {\nu} of {\mu_X} and {\mu_Y} and all metric couplings {d} of {d_X} and {d_Y}.

The next version (also from the same paper by Greven, Pfaffelhuber and Winter where it was called Eurandom metric) is

\displaystyle   d_{GP}^2(X,Y) = \inf_\nu\Big\{\epsilon>0\ :\ \nu\otimes\nu\{(x_1,y_1,x_2,y_2)\ :\ |d_X(x_1,x_2) - d_Y(y_1,y_2)|\geq\epsilon\}\leq\epsilon\Big\} \ \ \ \ \ (6)

where the infimum is taken over all measure couplings {\nu} only.

3. A very simple example

The calculation of the proposed metrics by hand can be quite cumbersome. Let’s look at the simplest example.

We consider metric spaces {X,Y\subset{\mathbb R}^d} (with the euclidean metric) accompanied with the measures {\mu_X=\delta_{x_0}} and {\mu_Y = \delta_{y_0}} for some points {x_0\in X} and {y_0\in Y}. In this case the is only one measure coupling of {\mu_X} and {\mu_Y}, namely

\displaystyle  \nu = \delta_{(x_0,y_0)}.

Now it is easy to calculate the variant (4) of the Gromov-Wasserstein metric:

\displaystyle  \begin{array}{rcl}  d_{GW}^2(X,Y) &=& \tfrac12\Big(\int_{X\times Y}\int_{X\times Y}\big| |x_1-x_2| - |y_1 -y_2|\big|^p d\delta_{(x_0,y_0)}(x_1,y_1)d\delta_{(x_0,y_0)}\Big)^{1/p} \\ &=& \tfrac12\Big(\int_{X\times Y}\big| |x_0-x_2| - |y_0 -y_2|\big|^pd\delta_{(x_0,y_0)}\Big)^{1/p} \\ &=& 0. \end{array}

Let’s have a look at the variant (3): Since there is only one measure coupling, the metric is

\displaystyle  \begin{array}{rcl}  d_{GW}^1(X,Y) & = &\inf_d\Big(\int_{X\times Y} d(x,y)^pd\delta_{(x_0,y_0)}(x,y)\Big)^{1/p} \\ & = &\inf_d d(x_0,y_0). \end{array}

As we have learned in Example 4 in the previous post, we can find a metric coupling of {X} and {Y} that brings the points {x_0} in {X} and {y_0} in {Y} arbitrarily close together (by embedding both {X} and {Y} into some {{\mathbb R}^n} such that these points are only {\epsilon}-far away from each other). Hence, we see that we have

\displaystyle  d_{GW}^1(X,Y) = 0

similarly to {d_{GW}^2}.

Now let’s look at the Gromov-Prokhorov metric from (5). Again we only have one measure coupling and we get

\displaystyle  d_{GP}^1(X,Y) = \inf\Big\{\epsilon>0\ :\ \exists d\ \text{s.t}\ \delta_{(x_0,y_0)}\{(x,y)\ :\ d(x,y)\geq\epsilon\}\leq\epsilon\Big\}.

Since the measure coupling is a Dirac, we can evaluate

\displaystyle  \delta_{(x_0,y_0)}\{(x,y)\ :\ d(x,y)\geq\epsilon\} = \begin{cases} 1 & d(x_0,y_0)\geq \epsilon\\ 0 & d(x_0,y_0)<\epsilon. \end{cases}

As observed previously, there are metric couplings which bring the points {x_0} and {y_0} arbitrarily close together, and hence for any {\epsilon>0} there is a metric coupling such that {\delta_{(x_0,y_0)}\{(x,y)\ :\ d(x,y)\geq\epsilon\} = 0} which shows that

\displaystyle  d_{GP}^1(X,Y) = 0.

Finally, consider the second variant of the Gromov-Prokhorov metric from (6). Since we only have the one measure coupling {\nu = \delta_{(x_0,y_0)}} we have

\displaystyle  d_{GP}^2(X,Y) = \inf\Big\{\epsilon>0\ :\ \delta_{(x_0,y_0)}\otimes\delta_{(x_0,y_0)}\{(x_1,y_1,x_2,y_2)\ :\ \big| |x_1-x_2| - |y_1-y_2|\big|\geq\epsilon\}<\epsilon\Big\}.

Evaluating the tensored Dirac delta is easy: We have that {\delta_{(x_0,y_0)}\otimes\delta_{(x_0,y_0)}\{(x_1,y_1,x_2,y_2)\ :\ \big| |x_1-x_2| - |y_1-y_2|\big|\geq\epsilon\}} is either one or zero and it is one if and only if the point {(x_0,y_0,x_0,y_0)} is in the set that is measured. However, we have that {\big| |x_0-x_0| - |y_0-y_0|\big|} is, of course, always zero (and never larger than any {\epsilon >0}). Hence, the measure is always zero and hence, this version of the Gromov-Prokhorov distance also gives

\displaystyle  d_{GP}^2(X,Y) = 0.

Note that all four metrics can not see that the two Diracs are at different points. The reason seems to be, that one can “deform the metric space outside of the support of the measures” arbitrarily, in some sense.

It seems, that the computation of {d_{GW}^2} and {d_{GP}^2} were easier, since one needs to know all measure couplings and no metric couplings has been involved.

4. Next difficult example: Compare some mm-space to a point

Ok, we have seen that mm-spaces which carry their measure in just a single point all look alike in both versions of the Gromov-Wasserstein metric and also in both versions of the Gromov-Prokhorov metric. Let’s look at a slightly more difficult example: We consider some mm-space {(X,d_X,\mu)} and want to calculate its distance to a single point, i.e. the mm-space {Y=\{0\}} with the only possible metric and measure. This should somehow measure the “size” or “spread” of the mm-space {X}.

First, we need to know all measure couplings and all metric couplings between these spaces. The measure couplings are very easy: There is just one, namely

\displaystyle  \nu = \mu\otimes \delta

(i.e. all subsets of {X\times \{0\}} are treated as if they were subsets of {X}). Concerning metric couplings, there are a few more. We allow semi-metrics {d} on the disjoint union {X\sqcup\{0\}}: Since {d} should respect the metric {d_X} on {X} we see that all metric couplings are parametrized by the points {x_0} in {X} by identifying {0} (the element in {Y}) with this point {x_0}, i.e. all metric couplings are of the form {d_{x_0}} defined by

\displaystyle  d_{x_0}(x,y) = d_X(x,y)\ (\text{for}\ x,y\in X),\qquad d_{x_0}(x,0) = d_X(x,x_0).

(This only gives semi-metrics since we have {d_{x_0}(x_0,0)=0} although, formally {x_0\not 0}.)

Let’s calculate the first Gromov-Wasserstein metric: There is only one measure coupling and we use the parametrization of the metric couplings to deduce

\displaystyle  \begin{array}{rcl}  d_{GW}^1(X,\{0\}) &=& \inf_d\Big(\int_{X\times\{0\}} d(x,0)^pd(\mu\otimes \delta)(x,y)\Big)^{1/p}\\ &=& \inf_{x_0\in X}\Big(\int_{X} d_X(x,x_0)^pd\mu\Big)^{1/p}. \end{array}

This quantity seems to be known as “minimal {p}-th central moment” of {(X,d_X,\mu)}.

The second variant of the Gromov-Wasserstein metric is (remember, there is only one measure coupling)

\displaystyle  \begin{array}{rcl}  d_{GW}^2(X,\{0\}) &=& \tfrac12\Big(\int_{X\times\{0\}}\int_{X\times\{0\}} |d_X(x_1,x_2) - d_Y(0,0)|^p d(\mu\times\delta)(x_1,y_1) d(\mu\times\delta)(x_2,y_2) \Big)^{1/p}\\ &=& \tfrac12\Big(\int_X\int_X d_X(x_1,x_2)^p d\mu d\mu\Big)^{1/p}. \end{array}

This quantity (without the factor {1/2}) is called the “{p}-diameter” of {(X,d_X)}.

Let’s turn to the Gromov-Prokhorov metrics. The first one is (remember, that the metric couplings {d} are parametrized by the points in {X})

\displaystyle  \begin{array}{rcl}  d_{GP}^1(X,\{0\}) &=& \inf_{d}\Big\{\epsilon>0\ :\ (\mu\otimes\delta)\{(x,0)\ :\ d(x,0)\geq\epsilon\}\leq\epsilon\Big\}\\ &=& \inf_{x_0\in X}\Big\{\epsilon>0\ :\ \mu\{x\ :\ d_X(x,x_0)\geq\epsilon\}\leq\epsilon\Big\}. \end{array}

If this looks familiar, then you may have encountered the Ky-Fan metric already? The Ky-Fan metric is a metric between random variables {\xi_1} and {\xi_2} defined of the same probability space {(X,\mu)} with values in a metric space with metric {d}. It reads as

\displaystyle  d_{KF}(\xi_1,\xi_2) = \inf\Big\{\epsilon>0\ :\ \mu\{\omega\ :\ d(\xi_1(\omega),\xi_2(\omega))\geq\epsilon\}\leq\epsilon\Big\}.

Hence, the first version of the Gromov-Prokhorov metric is

\displaystyle  d_{GP}^1(X,\{0\}) = \inf_{x_0\in X}\ d_{KF}(\mathrm{id},x_0),

i.e., the minimal Ky-Fan metric between the identity mapping and the constant mappings. (In other words, it measures how far the identity is from the constant mappings in the sense of Ky Fan.)

The second variant of the Gromov-Prokhorov metric is (remember, the only measure coupling is {\nu = \mu\times \delta})

\displaystyle  \begin{array}{rcl}  d_{GP}^2(X,\{0\}) &=& \inf \Big\{\epsilon>0\ :\ \nu\otimes\nu\{(x_1,0,x_2,0)\ :\ |d_X(x_1,x_2) -0|\geq\epsilon\}\leq\epsilon\Big\}\\ &=& \inf\Big\{\epsilon>0\ :\ \mu\otimes\mu\{(x_1,x_2)\ :\ d_X(x_1,x_2)\geq\epsilon\}\leq\epsilon\Big\}. \end{array}

I do not have a neat name or a good intuition for this metric yet (although it also looks like it measures “size” or “non-localization” of {X} in some sense). If you have one, let me know!