Consider a convex optimization problem of the form

\displaystyle \begin{array}{rcl} \min_{x}F(x) + G(Ax) \end{array}

with convex {F} and {G} and matrix {A}. (We formulate everything quite loosely, skipping over details like continuity and such, as they are irrelevant for the subject matter). Optimization problems of this type have a specific type of dual problem, namely the Fenchel-Rockafellar dual, which is

\displaystyle \begin{array}{rcl} \max_{y}-F^{*}(-A^{T}y) - G^{*}(y) \end{array}

and under certain regularity conditions it holds that the optimal value of the dual equals the the objective value of the primal and, moreover, that a pair {(x^{*},y^{*})} is both primal and dual optimal if and only if the primal dual gap is zero, i.e. if and only if

\displaystyle \begin{array}{rcl} F(x^{*})+G(Ax^{*}) + F^{*}(-A^{T}y^{*})+G^{*}(y^{*}) = 0. \end{array}

Hence, it is quite handy to use the primal dual gap as a stopping criteria for iterative methods to solve these problems. So, if one runs an algorithm which produces primal iterates {x^{k}} and dual iterates {y^{k}} one can monitor

\displaystyle \begin{array}{rcl} \mathcal{G}(x^{k},y^{k}) = F(x^{k})+G(Ax^{k}) + F^{*}(-A^{T}y^{k})+G^{*}(y^{k}). \end{array}

and stop if the value falls below a desired tolerance.

There is some problem with this approach which appears if the method produces infeasible iterates in the sense that one of the four terms in {\mathcal{G}} is actually {+\infty}. This may be the case if {F} or {G} are not everywhere finite or, loosely speaking, have linear growth in some directions (since then the respective conjugate will not be finite everywhere). In the rest of the post, I’ll sketch a general method that can often solve this particular problem.

For the sake of simplicity, consider the following primal dual algorithm

\displaystyle \begin{array}{rcl} x^{k+1} & = &\mathrm{prox}_{\tau F}(x^{k}-A^{T}y^{k})\\ y^{k+1} & = &\mathrm{prox}_{\sigma G^{*}}(y^{k}+\sigma A(2x^{k+1}-x^{k})) \end{array}

(also know as primal dual hybrid gradient method or Chambolle-Pock’s algorithm). It converges as soon as {\sigma\tau\leq \|A\|^{-2}}.

While the structure of the algorithm ensures that {F(x^{k})} and {G^{*}(y^{k})} are always finite (since always {\mathrm{prox}_{F}(x)\in\mathrm{dom}(F)}), is may be that {F^{*}(-A^{T}y^{k})} or {G(Ax^{k})} are indeed infinite, rendering the primal dual gap useless.

Let us assume that the problematic term is {F^{*}(-A^{T}y^{k})}. Here is a way out in the case where one can deduce some a-priori bounds on {x^{*}}, i.e. a bounded and convex set {C} with {x^{*}\in C}. In fact, this is often the case (e.g. one may know a-priori that there exist lower bounds {l_{i}} and upper bounds {u_{i}}, i.e. it holds that {l_{i}\leq x^{*}_{i}\leq u_{i}}). Then, adding these constraints to the problem will not change the solution.

Let us see, how this changes the primal dual gap: We set {\tilde F(x) = F(x) + I_{C}(x)} where {C} is the set which models the bound constraints. Since {C} is a bounded convex set and {F} is finite on {C}, it is clear that

\displaystyle \begin{array}{rcl} \tilde F^{*}(\xi) = \sup_{x\in C}\,\langle \xi,x\rangle - F(x) \end{array}

is finite for every {\xi}. This leads to a finite duality gap. However, one should also adapt the prox operator. But this is also simple in the case where the constraint {C} and the function {F} are separable, i.e. {C} encodes bound constraints as above (in other words {C = [l_{1},u_{1}]\times\cdots\times [l_{n},u_{n}]}) and

\displaystyle \begin{array}{rcl} F(x) = \sum_{i} f_{i}(x_{i}). \end{array}

Here it holds that

\displaystyle \begin{array}{rcl} \mathrm{prox}_{\sigma \tilde F}(x)_{i} = \mathrm{prox}_{\sigma f_{i} + I_{[l_{i},u_{i}]}}(x_{i}) \end{array}

and it is simple to see that

\displaystyle \begin{array}{rcl} \mathrm{prox}_{\sigma f_{i} + I_{[l_{i},u_{i}]}}(x_{i}) = \mathrm{proj}_{[l_{i},u_{i}]}\mathrm{prox}_{\tau f_{i}}(x_{i}), \end{array}

i.e., only uses the proximal operator of {F} and project onto the constraints. For general {C}, this step may be more complicated.

One example, where this makes sense is {L^{1}-TV} denoising which can be written as

\displaystyle \begin{array}{rcl} \min_{u}\|u-u^{0}\|_{1} + \lambda TV(u). \end{array}

Here we have

\displaystyle \begin{array}{rcl} F(u) = \|u-u^{0}\|_{1},\quad A = \nabla,\quad G(\phi) = I_{|\phi_{ij}|\leq 1}(\phi). \end{array}

The guy that causes problems here is {F^{*}} which is an indicator functional and indeed {A^{T}\phi^{k}} will usually be dual infeasible. But since {u} is an image with a know range of gray values one can simple add the constraints {0\leq u\leq 1} to the problem and obtains a finite dual while still keeping a simple proximal operator. It is quite instructive to compute {\tilde F} in this case.