The problem of optimal transport of mass from one distribution to another can be stated in many forms. Here is the formulation going back to Kantorovich: We have two measurable sets and , coming with two measures and . We also have a function which assigns a transport cost, i.e. is the cost that it takes to carry one unit of mass from point to . What we want is a plan that says where the mass in should be placed in (or vice versa). There are different ways to formulate this mathematically.

A simple way is to look for a map which says that thet mass in should be moved to . While natural, there is a serious problem with this: What if not all mass at should go to the same point in ? This happens in simple situations where all mass in sits in just one point, but there are at least two different points in where mass should end up. This is not going to work with a map as above. So, the map is not flexible enough to model all kinds of transport we may need.

What we want is a way to distribute mass from one point in to the whole set . This looks like we want maps which map points in to functions on , i.e. something like where stands for some set of functions on . We can de-curry this function to some by . That’s good in principle, be we still run into problems when the target mass distribution is singular in the sense that is a “continuous” set and there are single points in that carry some mass according to . Since we are in the world of measure theory already, the way out suggests itself: Instead of a function on we look for a measure on as a transport plan.

The demand that we should carry *all* of the mass in to reach *all* of is formulated by marginals. For simplicity we just write these constraints as

(with the understanding that the first equation really means that for all continuous function it holds that ).

This leads us to the full transport problem

There is the following theorem which characterizes optimality of a plan and which is the topic of this post:

Theorem 1 (Fundamental theorem of optimal transport)Under some technicalities we can say that a plan which fulfills the marginal constraints is optimal if and only if one of the following equivalent conditions is satisfied:

- The support of is -cyclically monotone.
- There exists a -concave function such that its -superdifferential contains the support of , i.e. .

A few clarifications: The technicalities involve continuity, integrability, and boundedness conditions of and integrability conditions on the marginals. The full theorem can be found as Theorem 1.13 in A user’s guide to optimal transport by Ambrosio and Gigli. Also the notions -cyclically monotone, -concave and -superdifferential probably need explanation. We start with a simpler notion: -monotonicity:

Definition 2A set is-monotone, if for all it holds that

If you find it unclear what this has to do with monotonicity, look at this example:

Example 1Let and let be the usual scalar product. Then -monotonicity is the condition that for all it holds that

which may look more familiar. Indeed, when and are subset of the real line, the above conditions means that the set somehow “moves up in ” if we “move right in ”. So -monotonicity for is something like “monotonically increasing”. Similarly, for , -monotonicity means “monotonically decreasing”.

You may say that both and are strange cost functions and I can’t argue with that. But here comes: ( being the euclidean norm) seems more natural, right? But if we have a transport plan for this for some marginals and we also have

i.e., the transport cost for differs from the one for only by a constant independent of , so may well use the latter.

The fundamental theorem of optimal transport uses the notion of -cyclical monotonicity which is stronger that just -monotonicity:

Definition 3A set is -cyclically monotone, if for all , and all permutations of it holds that

For we get back the notion of -monotonicity.

Definition 4A function is-concaveif there exists some function such that

This definition of -concavity resembles the notion of convex conjugate:

Example 2Again using we get that a function is -concave if

and, as an infimum over linear functions, is clearly concave in the usual way.

Definition 5The -superdifferential of a-concavefunction is

where is the

-conjugateof defined by

Again one may look at and observe that the -superdifferential is the usual superdifferential related to the supergradient of concave functions (there is a Wikipedia page for subgradient only, but the concept is the same with reversed signs in some sense).

Now let us sketch the proof of the fundamental theorem of optimal transport: \medskip

*Proof (of the fundamental theorem of optimal transport).* Let be an optimal transport plan. We aim to show that is -cyclically monotone and assume the contrary. That is, we assume that there are points and a permutation such that

We aim to construct a such that is still feasible but has a smaller transport cost. To do so, we note that continuity of implies that there are neighborhoods of and of such that for all and it holds that

We use this to construct a better plan : Take the mass of in the sets and shift it around. The full construction is a little messy to write down: Define a probability measure on the product as the product of the measures . Now let and be the projections of onto and , respectively, and set

where denotes the pushforward of measures. Note that the new measure is signed and that fulfills

- is a non-negative measure
- is feasible, i.e. has the correct marginals

which, all together, gives a contradiction to optimality of . The implication of item 1 to item 2 of the theorem is not really related to optimal transport but a general fact about -concavity and -cyclical monotonicity (c.f.~this previous blog post of mine where I wrote a similar statement for convexity). So let us just prove the implication from item 2 to optimality of : Let fulfill item 2, i.e. is feasible and is contained in the -superdifferential of some -concave function . Moreover let be any feasible transport plan. We aim to show that . By definition of the -superdifferential and the -conjugate we have

Since by assumption, this gives

which shows the claim.

Corollary 6If is a measure on which is supported on a -superdifferential of a -concave function, then is an optimal transport plan for its marginals with respect to the transport cost .