Consider a convex optimization problem of the form
with convex and and matrix . (We formulate everything quite loosely, skipping over details like continuity and such, as they are irrelevant for the subject matter). Optimization problems of this type have a specific type of dual problem, namely the Fenchel-Rockafellar dual, which is
and under certain regularity conditions it holds that the optimal value of the dual equals the the objective value of the primal and, moreover, that a pair is both primal and dual optimal if and only if the primal dual gap is zero, i.e. if and only if
Hence, it is quite handy to use the primal dual gap as a stopping criteria for iterative methods to solve these problems. So, if one runs an algorithm which produces primal iterates and dual iterates one can monitor
and stop if the value falls below a desired tolerance.
There is some problem with this approach which appears if the method produces infeasible iterates in the sense that one of the four terms in is actually . This may be the case if or are not everywhere finite or, loosely speaking, have linear growth in some directions (since then the respective conjugate will not be finite everywhere). In the rest of the post, I’ll sketch a general method that can often solve this particular problem.
For the sake of simplicity, consider the following primal dual algorithm
(also know as primal dual hybrid gradient method or Chambolle-Pock’s algorithm). It converges as soon as .
While the structure of the algorithm ensures that and are always finite (since always ), is may be that or are indeed infinite, rendering the primal dual gap useless.
Let us assume that the problematic term is . Here is a way out in the case where one can deduce some a-priori bounds on , i.e. a bounded and convex set with . In fact, this is often the case (e.g. one may know a-priori that there exist lower bounds and upper bounds , i.e. it holds that ). Then, adding these constraints to the problem will not change the solution.
Let us see, how this changes the primal dual gap: We set where is the set which models the bound constraints. Since is a bounded convex set and is finite on , it is clear that
is finite for every . This leads to a finite duality gap. However, one should also adapt the prox operator. But this is also simple in the case where the constraint and the function are separable, i.e. encodes bound constraints as above (in other words ) and
Here it holds that
and it is simple to see that
i.e., only uses the proximal operator of and project onto the constraints. For general , this step may be more complicated.
One example, where this makes sense is denoising which can be written as
Here we have
The guy that causes problems here is which is an indicator functional and indeed will usually be dual infeasible. But since is an image with a know range of gray values one can simple add the constraints to the problem and obtains a finite dual while still keeping a simple proximal operator. It is quite instructive to compute in this case.