The fundamental theorem of calculus relates the derivative of a function to the function itself via an integral. A little bit more precise, it says that one can recover a function from its derivative up to an additive constant (on a simply connected domain).

In one space dimension, one can fix some {x_{0}} in the domain (which has to be an interval) and then set

\displaystyle F(x) = \int_{x_{0}}^{x}f'(t) dt.

Then {F(x) = f(x) + c} with {c= - f(x_{0})}.

Actually, a similar claim is true in higher space dimensions: If {f} is defined on a simply connected domain in {{\mathbb R}^{d}} we can recover {f} from its gradient up to an additive constant as follows: Select some {x_{0}} and set

\displaystyle F(x) = \int_{\gamma}\nabla f(x)\cdot dx \ \ \ \ \ (1)

for any path {\gamma} from {x_{0}} to {x}. Then it holds under suitable conditions that

\displaystyle F(x) = f(x) - f(x_{0}).

And now for something completely different: Convex functions and subgradients.

A function {f} on {{\mathbb R}^{n}} is convex if for every {x} there exists a subgradient {x^{*}\in{\mathbb R}^{n}} such that for all {y} one has the subgradient inequality

\displaystyle f(y) \geq f(x) + \langle x^{*}, y-x\rangle.

Writing this down for {x} and {y} with interchanged roles (and {y^{*}} as corresponding subgradient to {y}), we see that

\displaystyle \langle x-y,x^{*}-y^{*}\rangle \geq 0.

In other words: For a convex function {f} it holds that the subgradient {\partial f} is a (multivalued) monotone mapping. Recall that a multivalued map {A} is monotone, if for every {y \in A(x)} and {y^{*}\in A(y)} it holds that {\langle x-y,x^{*}-y^{*}\rangle \geq 0}. It is not hard to see that not every monotone map is actually a subgradient of a convex function (not even, if we go to “maximally monotone maps”, a notion that we sweep under the rug in this post). A simple counterexample is a (singlevalued) linear monotone map represented by

\displaystyle \begin{bmatrix} 0 & 1\\ -1 & 0 \end{bmatrix}

(which can not be a subgradient of some {f}, since it is not symmetric).

Another hint that monotonicity of a map does not imply that the map is a subgradient is that subgradients have some stronger properties than monotone maps. Let us write down the subgradient inequalities for a number of points {x_{0},\dots x_{n}}:

\displaystyle \begin{array}{rcl} f(x_{1}) & \geq & f(x_{0}) + \langle x_{0}^{*},x_{1}-x_{0}\rangle\\ f(x_{2}) & \geq & f(x_{1}) + \langle x_{1}^{*},x_{2}-x_{1}\rangle\\ \vdots & & \vdots\\ f(x_{n}) & \geq & f(x_{n-1}) + \langle x_{n-1}^{*},x_{n}-x_{n-1}\rangle\\ f(x_{0}) & \geq & f(x_{n}) + \langle x_{n}^{*},x_{0}-x_{n}\rangle. \end{array}

If we sum all these up, we obtain

\displaystyle 0 \geq \langle x_{0}^{*},x_{1}-x_{0}\rangle + \langle x_{1}^{*},x_{2}-x_{1}\rangle + \cdots + \langle x_{n-1}^{*},x_{n}-x_{n-1}\rangle + \langle x_{n}^{*},x_{0}-x_{n}\rangle.

This property is called {n}-cyclical monotonicity. In these terms we can say that a subgradient of a convex function is cyclical monotone, which means that it is {n}-cyclically monotone for every integer {n}.

By a remarkable result by Rockafellar, the converse is also true:

Theorem 1 (Rockafellar, 1966) Let {A} by a cyclically monotone map. Then there exists a convex function {f} such that {A = \partial f}.

Even more remarkable, the proof is somehow “just an application of the fundamental theorem of calculus” where we recover a function by its subgradient (up to an additive constant that depends on the basepoint).

Proof: we aim to “reconstruct” {f} from {A = \partial f}. The trick is to choose some base point {x_{0}} and corresponding {x_{0}^{*}\in A(x_{0})} and set

\displaystyle \begin{array}{rcl} f(x) &=& \sup\left\{ \langle x_{0}^{*}, x_{1}-x_{0}\rangle + \langle x_{1}^{*},x_{2}-x_{1}\rangle + \cdots + \langle x_{n-1}^{*},x_{n}-x_{n-1}\rangle\right.\\&& \qquad\left. + \langle x_{n}^{*},x-x_{n}\rangle\right\}\qquad\qquad (0)\end{array}

where the supremum is taken over all integers {m} and all pairs {x_{i}^{*}\in A(x_{i})} {i=1,\dots, m}. As a supremum of affine functions {f} is clearly convex (even lower semicontinuous) and {f(x_{0}) = 0} since {A} is cyclically monotone (this shows that {f} is proper, i.e. not equal to {\infty} everywhere). Finally, for {\bar x^{*}\in A(\bar x)} we have

\displaystyle \begin{array}{rcl} f(x) &\geq& \sup\left\{ \langle x_{0}^{*}, x_{1}-x_{0}\rangle + \langle x_{1}^{*},x_{2}-x_{1}\rangle + \cdots + \langle x_{n-1}^{*},x_{n}-x_{n-1}\rangle\right.\\ & & \qquad \left.+ \langle x_{n}^{*},\bar x-x_{n}\rangle + \langle \bar x^{*}, x-\bar x\rangle\right\} \end{array}

with the supremum taken over all integers {m} and all pairs {x_{i}^{*}\in A(x_{i})} {i=1,\dots, m}. The right hand side is equal to {f(x) + \langle \bar x^{*},x-\bar x\rangle} and this shows that {f} is indeed convex. \Box

Where did we use the fundamental theorem of calculus? Let us have another look at equation~(0). Just for simplicity, let us denote {x_{i}^{*} =\nabla f(x_{i})}. Now consider a path {\gamma} from {x_{0}} to {x} and points {0=t_{0}< t_{1}<\cdots < t_{n}< t_{n+1} = 1} with {\gamma(t_{i}) = x_{i}}. Then the term inside the supremum of~(0) equals

\displaystyle \langle \nabla f(\gamma(t_{0})),\gamma(t_{1})-\gamma(t_{0})\rangle + \dots + \langle \nabla f(\gamma(t_{n})),\gamma(t_{n+1})-\gamma(t_{n})\rangle.

This is Riemannian sum for an integral of the form {\int_{\gamma}\nabla f(x)\cdot dx}. By monotonicity of \nabla {f}, we increase this sum, if we add another point {\bar t} (e.g. {t_{i}<\bar t<t_{i+1}}, and hence, the supremum does converge to the integral, i.e.~(0) is equal to

\displaystyle f(x) = \int_{\gamma}\nabla f(x)\cdot dx

where {\gamma} is any path from {x_{0}} to {x}.

In another blogpost I wrote about convexity from an abstract point of view. Recall, that convex functions {f:X\rightarrow Y} can be defined as soon as we have a real linear structure on {X} and an order on {Y} as this allows to formulate the basic requirement for a convex function, namely that for all {x,y\in X } and {0\leq\lambda\leq 1} it holds that

\displaystyle f(\lambda x + (1-\lambda)y)\leq \lambda f(x) + (1-\lambda)f(y).

One amazing thing about convexity is, that it implies some regularity for the function. Indeed, you’ll find something on the net if you search for “convexity implies continuity”. But wait. How can that be? We have a mapping {f} from a vector space {X} to some ordered space {Y} (which I will always assume to be {{\mathbb R}\cup\{\infty\}} here, i.e. the extended real line) and we did not specify any topology on {X} (while the extended real line carries its usual order topology). Indeed, one can equip a vector space with a lot of different topologies so how can it be that some property like convexity, which is expressed in purely algebraical terms, implies something like continuity, which is topological property? The answer is, that it is not really true that “convexity implies continuity”. The correct statement is a bit more subtle:

A convex function is Lipschitz continuous at any point where it is locally bounded.

Ok, here we have something more: We need boundedness of {f}, but this is still related to {Y} and not related to {X}. But there is this little word “locally” and this is the point where some topology on {X} comes into play. Let’s assume that we have even a metric on {X} so that we can talk about balls. Then, the statement reads as:

A convex function {f} is Lipschitz continuous at a point {x} if there exists a {C>0} and {r>0} such that {|f(y)|\leq C} for {y\in B_r(x)}.

Put differently: The continuity of a convex function {f} depends on the boundedness of {f} on neighborhoods. Consequently, if we change the topology, we change the set of neighborhoods and hence, a fixed convex function may have different continuity behavior in different topologies. This does indeed happen. Consider the following extreme example: Let {x_0\in X} and

\displaystyle f(x) = \begin{cases} 0 & x=x_0\\ \infty & \text{else.} \end{cases}

This function is convex but, for the norm-topology, not continuous at any point. Also, it is not locally bounded at any point. However, if we change the topology such that each point is its own neighborhood (that is, we take the discrete metric), than we get local boundedness and also continuity of {f}.

There are several answers to the following question:

1. What is a convex set?

For a convex set you probably know these definitions:

Definition 1 A subset {C} of a real vector space {V} is convex if for any {x,y\in C} and {\lambda\in[0,1]} it holds that {\lambda x + (1-\lambda)y\in C}.

In other words: If two points lie in the set, then every convex combination also lies in the set.

While this is a “definition from the inside”, convex sets can also be characterized “from the outside”. We add closedness as an assumption and get:

Definition 2 A closed subset {C} of a real locally convex topological vector space {V} is convex if it is the intersection of closed halfspaces (i.e. sets of the form {\{x\in V\ :\ \langle a,x\rangle\geq c\}} for some {a} in the dual space {V^*} and {c\in {\mathbb R}}).

Moreover, we could define convex sets via convex functions:

Definition 3 A set {C\subset V} is convex if there is a convex function {f:V\rightarrow {\mathbb R}} such that {C = \{x\ :\ f(x)\leq 0\}}.

Of course, this only makes sense once we have defined convex functions. Hence, we could also ask the question:

2. What is a convex function?

We can define a convex function by means of convex sets as follows:

Definition 4 A function {f:V\rightarrow{\mathbb R}} from a real vector space into the real numbers is convex, if its epigraph {\text{epi} f = \{(x,\mu)\ :\ f(x)\leq \mu\}\subset V\times {\mathbb R}} is convex (as a subset of the vector space {V\times{\mathbb R}}).

The epigraph consists of the points {(x,\mu)} which lie above the graph of the function and carries the same information as the function.

(Let me note that one can replace the real numbers here and in the following with the extended real numbers {\bar {\mathbb R} = {\mathbb R}\cup\{-\infty,\infty\}} if one uses the right extension of the arithmetic and the obvious ordering, but we do not consider this in this post.)

Because epigraphs are not arbitrary convex sets but have a special form (if a point {(x,\mu)} is in an epigraph, then every {(x,\lambda)} with {\lambda\geq \mu} is also in the epigraph), and because the underlying vector space {V\times {\mathbb R}} comes with an order in the second component, some of the definitions for convex sets from above have a specialized form:

From the definition “convex combinations stay in the set” we get:

Definition 5 A function {f:V\rightarrow {\mathbb R}} is convex, if for all {x,y\in V} and {\lambda\in [0,1]} it holds that

\displaystyle f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda)f(y).

In other words: The secant of any two points on the graph lies above the graph.

From the definition by “intersection of half spaces” we get another definition. Since we added closedness in this case we add this assumption also here. However, closedness of the epigraph is equivalent to lower-semicontinuity (lsc) of the function and since lsc functions are very convenient we use this notion:

Definition 6 A function {f:V\rightarrow{\mathbb R}} is convex and lsc if it is the pointwise supremum of affine functions, i.e., for some set {S} of tuples {(a,c)\in V^*\times {\mathbb R}} it holds that

\displaystyle f(x) = \sup_{(a,c) \in S} \langle a,x\rangle + c.

A special consequence if this definition is that tangents to the graph of a convex function lie below the function. Another important consequence of this fact is that the local behavior of the function, i.e. its tangent plane at some point, carries some information about the global behavior. Especially, the property that the function lies above its tangent planes allows one to conclude that local minima of the function are also global. Probably the last properties are the ones which give convex functions a distinguished role, especially in the field of optimization.

Some of the previous definitions allow for generalizations into several direction and the quest for the abstract core of the notion of convexity has lead to the field of abstract convexity.

3. Abstract convexity

When searching for abstractions of the notion of convexity one may get confused by the various different approaches. For example there are generalized notions of convexity, e.g. for function of spaces of matrices (e.g. rank-one convexity, quasi-convexity or polyconvexity) and there are also further different notions like pseudo-convexity, invexity or another form ofquasi-convexity. Here we do not have generalization in mind but abstraction. Although both things are somehow related, the way of thinking is a bit different. Our aim is not to find a useful notion which is more general than the notion of convexity but to find a formulation which contains the notion of convexity and abstracts away some ingredients which probably not carry the essence of the notion.

In the literature one also finds several approaches in this direction and I mention only my favorite one (and I restrict myself to abstractly convex functions and not write about abstractly convex sets).

To me, the most appealing notion of an abstract convex function is an abstraction of the definition as “pointwise supremum of affine functions”. Let’s look again at the definition:

A function {f:V\rightarrow {\mathbb R}} is convex and lsc if there is a subset {S} of {V^*\times {\mathbb R}} such that

\displaystyle f(x) = \sup_{(a,c)\in S} \langle a,x\rangle + c.

We abstract away the vector space structure and hence, also the duality, but keep the real-valuedness (together with its order) and define:

Definition 7 Let {X} be a set and let {W} be a set of real valued function on {X}. Then a function {f:X\rightarrow{\mathbb R}} is said to be {W}-convex if there is a subset {S} of {W\times {\mathbb R}} such that

\displaystyle f(x) = \sup_{(w,c)\in S} w(x) + c.

What we did in this definition was simply to replace continuous affine functions {x\mapsto \langle a,x\rangle} on a vector space by an arbitrary collection of real valued functions {x\mapsto w(x)} on a set. On sees immediately that for every function {w\in W} and any real number {c} the function {w+c} is {W} convex (similarly to the fact that every affine linear function is convex).

Another nice thing about this approach is, that it allows for some notion of duality/conjugation. For {f:V\rightarrow{\mathbb R}} we define the {W}-conjugate by

\displaystyle f^{W*}(w) = \sup_{w\in W} \Big(w(x)- f(x) \Big)

and we can even formulate a biconjugate

\displaystyle f^{W**}(x) = \sup_x \Big(w(x) - f^{W*}(w)\Big).

We naturally have a Fenchel inequality

\displaystyle w(x) \leq f(x) + f^{W*}(w)

and we may even define subgradients as usual. Note that a conventional subgradient is an element of the dual space which defines a tangent plane at the point where the subgradient is taken, that is, {a\in V^*} is a subgradient of {f} at {x}, if for all {y\in V} it holds that {f(y) \geq f(x) + \langle a,y-x\rangle} or

\displaystyle f(y) - \langle a,y\rangle\geq f(x) - \langle a,x\rangle.

A {W}-subgradient is an element of {W}, namely we define: {w\in\partial^{W}f(x)} if

\displaystyle \text{for all}\ y:\quad f(y) -w(y) \geq f(x) - w(x).

Then we also have a Fenchel equality:

\displaystyle w\in\partial^W f(x)\iff w(x) = f(x) + f^{W*}(w).

One may also take dualization as the starting point for an abstraction.

4. Abstract conjugation

We could formulate the {W}-conjugate as follows: For {\Phi(w,x) = w(x)} we have

\displaystyle f^{W*}(x) = \sup_w \Big(\Phi(w,x) - f(x)\Big).

This opens the door to another abstraction: For some sets {X,W} (without any additional structure) define a coupling function {\Phi: X\times W \rightarrow {\mathbb R}} and define the {\Phi}-conjugate as

\displaystyle f^{\Phi*}(w) = \sup_x \Big(\Phi(w,x) - f(x)\Big)

and the {\Phi}-biconjugate as

\displaystyle f^{\Phi**}(x) = \sup_x \Big(\Phi(w,x) - f^{W*}(w)\Big)