Today I’d like to collect some comments one a few papers I stumbled upon recently on the arXiv.

1. TGV minimizers in 1D

First, about a month ago two very similar paper appeared in the same week:

Both papers treat the recently proposed “total generalized variation” model (which is a somehow-but-not-really-higher-order generalization of total variation). The total variation of a function {u\in L^1(\Omega)} ({\Omega\subset{\mathbb R}^d}) is defined by duality

\displaystyle  TV(u) = \sup\Big\{\int_\Omega \mathrm{div} \phi\, u\,dx\ :\ \phi\in C^\infty_c(\Omega,{\mathbb R}^d), |\phi|\leq 1\Big\}.

(Note that the demanded high regularity of the test functions {\phi} is not essential here, as we take a supremum over all these functions under the only, but important, requirement that the functions are bounded. Test functions from {C^1_c(\Omega,{\mathbb R}^d)} would also do.)

Several possibilities for extensions and generalization of the total variation exist by somehow including higher order derivatives. The “total generalized variation” is a particular successful approach which reads as (now using two non-negative parameter {\alpha,\beta} which do a weighting):

\displaystyle  TGV_{\beta,\alpha}^2(u) = \sup\Big\{\int_\Omega \mathrm{div}^2 \phi\, u\,dx\ :\ \phi\in C^\infty_c(\Omega,S^{d\times d}),\ |\phi|\leq \beta,\ |\mathrm{div}\phi|\leq \alpha\Big\}.

To clarify some notation: {S^{d\times d}} are the symmetric {d\times d} matrices, {\mathrm{div}^n} is the negative adjoint of {\nabla^n} which is the differential operator that collects all partial derivatives up to the {n}-th order in a {d\times\cdots\times d}-tensor. Moreover {|\phi|} is some matrix norm (e.g. the Frobenius norm) and {|\mathrm{div}\phi|} is some vector norm (e.g. the 2-norm).

Both papers investigate so called denoising problems with TGV penalty and {L^2} discrepancy, i.e. minimization problems

\displaystyle  \min_u \frac12\int_\Omega(u-u^0)^2\, dx + TGV_{\alpha,\beta}^2(u)

for a given {u^0}. Moreover, both papers treat the one dimensional case and investigate very special cases in which they calculate minimizers analytically. In one dimension the definition of {TGV^2} becomes a little more familiar:

\displaystyle  TGV_{\beta,\alpha}^2(u) = \sup\Big\{\int_\Omega \phi''\, u\,dx\ :\ \phi\in C^\infty_c(\Omega,{\mathbb R}),\ |\phi|\leq \beta,\ |\phi'|\leq \alpha\Big\}.

Some images of both papar are really similar: This one from Papafitsoros and Bredies


and this one from Pöschl and Scherzer

Although both paper have a very similar scopes it is worth to read both. The calculations are tedious but both paper try to make them accessible and try hard (and did a good job) to provide helpful illustrations. Curiously, the earlier paper cites the later one but not conversely…

2. Generalized conditional gradient methods

Another paper I found very interesting was

This paper shows a nice duality which I haven’t been aware of, namely the one between the subgradient descent methods and conditional gradient methods. In fact the conditional gradient method which is treated is a generalization of the conditional gradient method which Kristian and I also proposed a while ago in the context of {\ell^1}-minimization in the paper Iterated hard shrinkage for minimization problems with sparsity constraints: To minimize the sum

\displaystyle  F(u) + \Phi(u)

with a differentiable {F} and a convex {\Phi} for which the subgradient of {\Phi} is easily invertible (or, put differently, for which you can minimize {\langle u,a\rangle + \Phi(u)} easily), perform the following iteration:

  1. At iterate {u^n} linearize {F} but not {\Phi} and calculate a new point {v^n} by

    \displaystyle  v^n = \mathrm{argmin}_v \langle F'(u^n),v\rangle + \Phi(v)

  2. Choose a stepsize {s^n\in [0,1]} and set the next iterate as a convex combination of {u^n} and {v^n}

    \displaystyle  u^{n+1} = u^n + s_n(v^n - u^n).

Note that for and indicator function

\displaystyle  \Phi(u) = \begin{cases} 0 & u\in C\\ \infty & \text{else} \end{cases}

you obtain the conditional gradient method (also known as Frank-Wolfe method). While Kristian and I derived convergence with an asymptotic rate for the case of {F(u) = \tfrac12\|Ku-f\|^2} and {\Phi} strongly coercive, Francis uses the formulation {F(u) = f(Au)} the assumption that the dual {f^*} of {f} has a bounded effective domain (which say that {f} has linear growth in all directions). With this assumption he obtains explicit constants and rates also for the primal-dual gap. It was great to see that eventually somebody really took the idea from the paper Iterated hard shrinkage for minimization problems with sparsity constraints (and does not think that we do heuristics for {\ell^0} minimization…).

About these ads